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ABSTRACT 

We present a detailed comparison of fundamental dark matter halo properties retrieved 
by a substantial number of different halo finders. These codes span a wide range of techniques 
including friends-of-friends (FOF), spherical-overdensity (SO) and phase-space based algo- 
rithms. We further introduce a robust (and publicly available) suite of test scenarios that allows 
halo finder developers to compare the performance of their codes against those presented here. 
This set includes mock haloes containing various levels and distributions of substructure at a 
range of resolutions as well as a cosmological simulation of the large-scale structure of the 
universe. 

All the halo finding codes tested could successfully recover the spatial location of our 
mock haloes. They further returned lists of particles (potentially) belonging to the object that 
led to coinciding values for the maximum of the circular velocity profile and the radius where 
it is reached. All the finders based in configuration space struggled to recover substructure 
that was located close to the centre of the host halo and the radial dependence of the mass 
recovered varies from finder to finder Those finders based in p hase space could resolve central 
substructure although they found difficulties in accurately recovering its properties. Via a 
resolution study we found that most of the finders could not reliably recover substructure 
containing fewer than 30-40 particles. However, also here the phase space finders excelled by 
resolving substructure down to 10-20 particles. By comparing the halo finders using a high 
resolution cosmological volume we found that they agree remarkably well on fundamental 
properties of astrophysical significance (e.g. mass, position, velocity, and peak of the rotation 
curve). 

We further suggest to utilize the peak of the rotation curve f max as a proxy for mass given 
the arbitrariness in defining a proper halo edge. 

Key words: methods: iV-body simulations - galaxies: haloes - galaxies: evolution - cosmol- 
ogy: theory - dark matter 



1 INTRODUCTION 



1.1 The Necessity for a Comparison Project 



While recent decades have seen great progress in the understanding 
and modelling of the large- and small-scale structure of the Uni- 
verse by means of numerical simulation there remains one very 
fundamental question that is yet to be answered: "How to find a 
dark matter halo?" The comparison of any cosmological simulation 
to observational data relies upon reproducibly identifying "objects" 
within the model. But how do we identify "dark matter haloes" or 
even "galaxies" in such simulations? Researchers in the field have 
developed a wide variety of techniques and codes to accomplish 
this task. But how does the performance of these various techniques 
and codes compare? While we still may argue about the proper 
definition of an "object" the various approaches should neverthe- 
less agree once the same recipe for defining a (dark matter) halo is 
used. 

This introduction begins by establishing why it is important to 
have "The Halo-Finder Comparison Project" before continuing by 
laying out the groundwork for the comparison we have undertaken. 
It is therefore subdivided into a first subsection where we highlight 
the necessity for such a comparison and summarise the recent lit- 
erature in this area. This section also includes a brief primer on 
halo finders and their history. The second part introduces the de- 
sign of the test cases, illustrated with some analysis. The last part 
then raises the question "how to cross-compare haloes?" as well 
as "what is actually a halo?" and presents a possible answer the 
authors agreed upon. 



f E-mail: alexander.knebe@uam.es 



Over the last 30 years great progress has been made in the de- 
velopment of simulation codes that model the distribution of 
dissipationless dark matter while simultaneously following the 
(substantially more complex) physics of the baryonic compo- 
nent that accounts for the observable Universe. Nowadays we 
have a great variety of highly reliable, cost effective (and some- 
times publicly available) codes designed for the simula tion of cos 
mic structure formation (e.g. [C ouchman et al. 1995| 
Gnedinlll995l:lKravtsoy et alll997l:iFrvxell et al.il200(il: 



2000l: ISprinpl et al.l IZOoiT iKnebe et all l200lj: iTevssieJ |2002|: 



Peril 1 19951: 



Bode et al 



O'Shea et al. 2004; Ouilis 2004; Dubinski et al. 2004; Merzetal 



20051; ISpringel .2005.; Bagla & Khandai. ,200ft ; ,Springell I2OIOI; 
Doumler & Knebell201o|y 



However, producing the (raw) simulation data is only the first 
step in the process; the model requires reduction before it can be 
compared to the observed Universe we inhabit. This necessitates 
access to analysis tools to map the data onto "real" objects; 
traditionally this has been accomplished via the use of "halo 
finders". Conventional halo finders search the (dark) matter density 
field within the simulations generated by the aforementioned codes 
to find locally over-dense gravitationally bound systems, which 
are then tagged as (dark) matter haloes. Such tools have led to 
critical insights into our understanding of the origin and evolution 
of cosmic structure. To take advantage of sophisticated simulation 
codes and to optimise their predictive power one obviously needs 
equally sophisticated halo finders! Therefor e, this field has also 
seen great development in recent years (e.g. 'Gelb & Bertschingen 
1994; Klypin & Holtzman 1997; Eisenst ein & Hut 1998; Sta^ 
2OO1I ; iBullock et al.l I2OOII ; ISpringel et all I2OOII ; lAubert et al. 
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Figure 1. Schematic presentation of the (cumulative) number of halo finders 
as a function of time, binned in ten-year intervals since 1970. The codes 
participating in this comparison project have been highlighted in bold font. 



20041; Gill et all 2004 Weller et al.l l2005t iNevrinck et all 
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Kim & Park! l200f?: 


DiemandetalJ l2006l; IShaw et alj 


20071; 
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Macieiewski et al.l2009l; Habib et al. 
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2009; Ascasibar 2010; Behroozi" 201(1 


Planelles & Ouihs 2010; Sutter & Ricker 2010; Rasera et al. 201Q; 


Skorv et al.ll2010l;lFalck et al.ll201 ll see also Fis.fTl noting that for 



some halo finders no code paper exists yet). But so far comparison 
projects have tended to focus on the simulation codes themselves 
rather than the analysis tools. 

The increasing demand and supply for halo finders is schemat- 
ically presented in Fig. [T] where we show the (cumulative) number 
of codes as a function of time, binned in ten year intervals since 
1970. We can clearly see the increasing pace of development in 
the past decade reflecting the necessity for sophisticated codes: in 
the last ten years the number of existing halo finding codes has 
practically tripled. While for a long time the sph erical overd en- 



sity method first mentioned by IPress & Schechteil (S O, Il974 



well as the fr iend-of-friends algorithm introduced by iDavis et al] 
(FOF Il985l) remained the standard techniques, the situation 



Lacev&Colelll994 


;lvan 


Kamoedl 19951; iPfitzner & Salmorll 199^1; 


Klvoin & Holtzman 


|l99' 


k Eisenstein & Hull 19981; Gottlober et al.l 


1999|). 



While the first generation of halo finders primarily fo- 
cused on identifying isolated field haloes the situation dramati- 
cally changed once it became clear that there was no such thing 
as "overmerging", i.e. the prem ature destruction o f haloes or- 
biting inside larger host haloes ( lKlvpinetal.lll999h was a nu- 
merical artifact rather than a real physical process. Now codes 
faced the challenge of finding both haloes embedded within 
the (more or less uniform) background density of the Universe 
as well as subhaloes orbiting within a density gradient of a 



with this problem i Stadel 2001 ;'Bullock et al.ll200ll; Sprineel et al. 


20011; 1 Aubert et al.l 


2004.; ,Gill et all 2004 Weller et al. 




2005; 


Nevrinck et al. 2005 


; Kim & Park 20061; lOiemand et al. 




2006; 


Shaw et al."2007'; 'Gardner et al."2007a"b'; 'Macieiewski et al.| 


2005; 



Knollmann & Knebe 2009; Planelles & Ouilis 20 IC). Along with 
the need to identify subhaloes simulations became much larger dur- 
ing this period and this led to a drive towards parallel analysis tools. 
The simulation data had become too large to be analysed on single 
CPU architectures and hence halo finders had to be developed to 
cope with this situation, too. 

Nevertheless, the first two halo finders men tioned in the litera- 
ture, i .e. the spherical overdensity (SO) method dPress & Schechtei 



1974 ) and the friends-of-friends (FOF) algorithm jDavis et al 
198a) remain the foundation of nearly every code; they often in- 
volve at least one phase where either particles are linked together 
or (spherical) shells are grown to collect particles. While we do 
not wish to invent stereotypes or a classification scheme for halo 
finders there are unarguably two distinct groups of codes: 

• density peak locator (-1- particle collection) 

• particle collector 

The density peak locators - such as the classical SO method - aim 
at identifying by whatever means peaks in the matter density field. 
About these centres (spherical) shells are grown out to the point 
where the density profile drops below a certain pre-defined value 
normally derived from a spherical top-hat collapse. Most of the 
methods utilising this approach merely differ in the way they lo- 
cate density peaks. The particle collector codes ~ above all the FOF 
method - connect and link particles together that are close to each 
other (either in a 3D configuration or in 6D phase-space). They af- 
terwards determine the centre of this mass aggregation. 

After the initial selection has been made most methods ap- 
ply a pruning phase where gravitationally unbound particles are 
removed from the object. While this unbinding procedure is not 
essential for isolated field haloes it is vital for subhaloes in order 
to properly alleviate contamination by host halo particles. Further- 
more, for subhaloes it appears essential to define the first guess 
for bound particles upon a stable and reproducible criterion for the 
subhalo edge. One cannot extend the (spherical) shells out to the 
point where the density drops below some preselected multiple of 
the universal background density as this level will not be reached 
anymore; one needs to "truncate" the object beforehand, usually at 
the point where the density rises again due to the fact that the sub- 
halo is embedded within a host. Similarly, particle collecting codes 
which use simple "proximity" as a criterion for grouping particles 
need to adjust their yardsticks. However, the situation may be a bit 
more straightforward for 6D phase-space finders as we expect the 
velocity distributions of the host and the subhalo to be different. 

Driven by the explosion of high-quality observational data, 
simulations of cosmological structure formation have moved to in- 
creasingly high mass and force resolution. The simulation codes 
and techniques have been continuously refined over the past 
few decades providing us with methods that are akin yet dif- 
ferent: they all have to solve the coUisionless Boltzmann equa- 
tion simultaneously with Poisson's equation and the equations that 
govern gas physics. In order to verify their credibility the past 
few years have seen substantial efforts to inter-com pare the re- 
sults stemining from these different technique s (cf. Frenk et al] 
1999; Knebe etal. 2000; O'Sheaetal. 2005; Agertz et all |2007|; 
Heitmann et alj |2008| ; iTasker et all |2008|) . However, to date the 
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literature lacks a quantitative comparison of the various halo 
finding techniques. While some efforts have been d i rected to- 
iLacey &Colj|l994 White 20021: iGill et al l 
200a iLukic et aTlBoOft: .Tweed et alj|2009l : 



wards this goal (e.g 



2004 ICohn & White! 



Macieiewski et alj 20091 : Knollmann & Knebe these studies 



primarily scratched the surface and no-one has yet presented a con- 
clusive inter-comparison based upon a well defined test suite. In 
addition, we would like to stress again that the analysis of massive 
state-of-the-art simulations is a non-trivial task, especially when it 
comes to the detailed substructure of the haloes. Furthermore, var- 
ious definitions of the extent of a halo exist within the literature 
making compariso ns of the results from different groups far from 
straightforward (cf . IWhitel200lllLukic et al.l2009h . 

We though acknowledge that there is a body of literature 
available that has compared halo finder methods to theoreti- 
cal predic tions (e.g. Press & Schechter 1974: Lacey & Cole '19941; 
Sheth&Tormen 199 9: Jenkins et al. 2001; Ro bertson et al . 20091; 



Courtin et al. 2010lv While this is important work, it nevertheless 



rather often leads to halo finders being tuned to match theoretical 
expectations than testing the validity of the code in the first place; 
the theories have sometimes been used to answer "what halo def- 
inition is required to match theoretical expectations?". This may 
therefore mask important differences between simple linear theory 
and the full non-linear growth of structure in the Universe. In this 
paper, we focus instead on directly comparing different codes for 
halo finding and leave theoretical expectation aside. 

In summary, there is no clear definition of "what is a (dark) 
matter halo?" never mind "what is a subhalo?". Workers in the field 
of simulation analysis tend to utilise their own definitions and codes 
to study the properties of haloes in cosmological simulations. This 
paper aims at rectifying this situation by presenting the first ever 
coherent halo-finder comparison involving a substantial number of 
codes as well as providing the community with a well-defined set 
of test cases. However, we would like to caution the reader that 
the prime objective of this comparison is codes and not algorithms. 
Therefore, while certain codes may be based upon the same algo- 
rithm they still may yield (marginally) different results due to the 
individual realisation of that algorithm. 

1.2 The Workshop 

During the last week of May 2010 we held the workshop "Haloes 
going MAD" in Miraflores de la Sierra close to Madrid dedicated 
to the issues surrounding identifying haloes in cosmological sim- 
ulations. Amongst other participants 15 halo finder representatives 
were present. The aim of this workshop was to define (and use!) 
a unique set of test scenarios for verifying the credibility and re- 
liability of such programs. We applied each and every halo finder 
to our newly established suite of test cases and cross-compared the 
results. 

To date most halo finders were introduced (if at all) in their 
respective code papers which presented their underlying principles 
and subjected them to tests within a full cosmological environment 
(primarily matching (sub-)halo mass functions to theoretical mod- 
els and fitting functions) and hence no general benchmarks such 
as the ones designed at the workshop and presented below existed 
prior to our meeting. Our newly devised suite of test cases is de- 
signed to be simple yet challenging enough to assist in establish- 
ing and gauging the credibility and functionality of all commonly 
employed halo finders. These tests include mock haloes with well 
defined properties as well as a state-of-the-art cosmological simu- 
lation. They involve the identification of individual objects, various 



levels of substructure, and dynamically evolving systems. The cos- 
mological simulation has been provided at various resolution levels 
with the best resolved containing a sufficient number of particles 
(1024^) that it can only presently be analysed in parallel. 

All the test cases and the analy- 
sis presented here is publicly available from 
http : / /popia . f t . uam . es/HaloesGoingMAD under 
the tab "The Data". 



1.3 How to compare Haloes? 

One of the most crucial questions to address is obviously "How 
to define a halo?". This question is intimately related to "How do 
we fairly cross-compare the results of the various halo finders?". 
While we all agreed that the proper definition of a halo should be 
a "gravitationally bound object", how the size of a halo should be 
defined proved harder to agree upon. The "virial radius" is not a 
well-defined property as its precise definition can (and does) vary 
from halo finder to halo finderQ Furthermore, this quantity is ill- 
defined for subhaloes that live within the environment of a host 
halo. While there is some work available that allows for a conver- 
sion between commonly applied methods to calculate the mass of 
an isolated field halo (see e.g. White 2001; Lukic et al. 2009), such 
variations in definition will nevertheless lead to discrepancies in a 
cross-comparison and hence we decided to abandon the ambigu- 
ous definition for the edge of a halo and rather focus on a property 
that uniquely specifies the halo for the code comparison project: 
the peak of the rotation curve as characterised by Vmux and the ra- 
dial location of this peak -Rmax, respectively. It has been argued 
(e.g. lAscasibar & Gottl6berll2008l) that these quantities do indeed 
provide a physically-motivated scale for dark matter haloes, show- 
ing that, in contrast to the inner regions, there is substantial scatter 
in their physical properties, as well as significant systematic trends 
with halo mass and cosmic epoch, beyond the radius 7?max. 

However, utilizing fmax raises two obvious issues: firstly, as 
''max is reached quite close to the centre of the halo its measurement 
is obviously sensitive to resolution. Secondly, as the value of t^max 
is set by the central particles it is not very sensitive to tidal strip- 
ping. The relationship between J?max and i ^yir for a range of NFW 
halo concentrations is given in figure 6 of iMuldrew et al.l (1201 1[) . 
The resolution issue can be addressed by increasing the number of 
particles required when studying subhalo properties so that Wmax 
will always be resolved sufficiently and credibly. The relevance of 
the stripping issue though depends upon the questions to be asked 
of the simulation data: are we interested in a (stable) measure of 
the (original) infall mass of the subhalo or do we want to quan- 
tify the mass inside the tidal radius? For the comparison project 
we decided to evaluate iimax in order to have a stable quantity. 
We further agreed that this quantity is better related to observa- 
tional data as it is possible to observe rotation curves (and hence 
''max) whereas the same ambiguity applies to observers: what is the 
(outer) edge of a halo and/or galaxy? Nevertheless, we also decided 
to include A^part (i.e. the total number of gravitationally bound par- 
ticles as returned by the respective halo finder) in the comparison 
as a halo is (or should be) a gravitationally bound entity. The val- 
ues for Afpart are the ones directly returned by the halo finder and 
are based upon the internal criteria each code uses. How (and if) 



^ We like to add the cautionaiy remark that a lot of the properties and in 
particular any "radius" is based upon the assumption of spherical symmetry 
which is not valid for all halo finders presented here. 
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to perform the unbinding procedure and what particles to consider 
as belonging to the (sub-)halo were questions left for each group 
taking part to answer as they saw fit. For several groups these par- 
ticle lists would nomally be pruned further during an additional 
post-processing phase prior to obtaining halo properties. The num- 
bers given here therefore serve solely as an indicator of whether 
or not particles are missing and/or - in case of subhaloes - be- 
long to the host. In addition, we also used the list of particles be- 
longing to each halo to calculate a fiducial Af2oo value (defined 
via M(< r)/47rr'' = 200 x pcrit) considering the object in iso- 
lation, even for subhaloes: there are physical situ ations - l i ke the 
dynamical fricti on on infalling loose groups (e.g. :Read et aLj2008l ; 
ILux et al.|[2O10l) - where the (total) mass is the physically impor- 
tant quantity. Such examples of the limitation of the Umax value as 
a proxy for mass have also been witnessed in our test cases and we 
will come back to it in Section |4.1.3l 

The first preliminary comparisons focusing on spatial loca- 
tion, Umax, and the number of bound particles for the static mock 
haloes indicate that even though there exist a variety of different ap- 
proaches for halo finding, most of the codes agree with the known 
correct result well. If substructure is located close to the centre of 
the host halo all the codes tested experienced some difficulties in 
accurately recovering it, with all the finders based in 3D configu- 
ration space missing some material. For subhaloes placed near the 
vei7 centre of the host halo the more sophisticated 6D finders based 
in phase space, while correctly noting the existence of a substruc- 
ture often overestimated the associated mass due to confusion with 
material in the host halo. After proving to ourselves that we could 
all successfully reproduce the location and scale of a supplied mock 
halo we performed a resolution study where the mass and hence 
number of particles in a subhalo was gradually lowered. We found 
that practically all halo finders have a completeness limit of 30- 
40 particles; substructure objects smaller than this are not reliably 
found. Once we had established a firm baseline for our compar- 
isons we extended the study to consider a full cosmological volume 
at varying resolution. The results of this comparison are presented 
in Section |4]below after we first briefly introduce each of the halo 
finders involved in the comparison project in Section|2]and describe 
the set-up of our mock haloes in Section[3] Finally we wrap-up and 
present some conclusions in Section|5] 



2 THE CODES 

In this Section we are going to briefly present the codes that par- 
ticipated in the halo-finder comparison project. We highlight their 
main features allowing for a better understanding of any (possible) 
differences in the comparison Section |4] The prime information to 
be found in each code paragraph should be sufficient to understand 
how the algorithm works, how the initial particle content of a halo is 
obtained, the way the the (sub-)halo centre and edge are calculated, 
how the unbinding is performed and which method of parallelisa- 
tion has been applied. Please note that not all halo finders perform 
an unbinding, are parallelized or suitable to detect subhaloes. And 
we explicitly stress that this Section is neither intended as a review 
of all available halo finders nor an elaborate exposition of the par- 
taking codes; for the latter we refer the reader to the respective code 
papers referenced in the subsection of each halo finder 

As much as possible, the halo finders have been organised in 
terms of their methodology: spherical overdensity finders first fol- 
lowed by FOF-based finders with 6D phase-space finders last. This 



applies to both the presentation in this Section as well as the com- 
parison in Section|4] 



2.1 AHF (Knollmann & Knebe) 

The M PI-l-OpenMP parallelised h alo finder AHE0 (AMIGA Halo 
Finder, iKnollmann & Knebd[2009l) . is an improvement of the MHF 
halo finder dOill et al.ll2004 . which employs a recursively refined 
grid to locate local overdensities in the density field. The identi- 
fied density peaks are then treated as centres of prospective haloes. 
The resulting grid hierarchy is further utilized to generate a halo 
tree readily containing the information which halo is a (prospec- 
tive) host and subhalo, respectively. We therefore like to stress that 
our halo finding algorithm is fully recursive, automatically iden- 
tifying haloes, sub-haloes, sub-subhaloes, etc. Halo properties are 
calculated based on the list of particles asserted to be gravitationally 
bound to the respective density peak. To generate this list of parti- 
cles we employ an iterative procedure starting from an initial guess 
of particles. This initial guess is based again upon the adaptive grid 
hierarchy: for field haloes we start with considering all particles out 
to the iso-density contour encompassing the overdensity defined by 
the virial criterion based upon the spherical top-hat collapse model; 
for subhaloes we gather particles up to the grid level shared with an- 
other prospective (sub-)halo in the halo tree which corresponds to 
the upturn point of the density profile due to the embedding within 
a (background) host. This tentative particle list is then used in an 
iterative procedure to remove unbound particles: In each step of the 
iteration, all particles with a velocity exceeding the local escape ve- 
locity, as given by the potential based on the particle list at the start 
of the iteration, are removed. The process is repeated until no parti- 
cles are removed anymore. At the end of this procedure we are left 
with bona fide haloes defined by their bound particles and we can 
calculate their integral and profiled quantities. 

The only parameter to be tuned is the refinement criterion 
used to generate the grid hierarchy that serves as the basis for the 
halo tree and also sets the accuracy with which the centres are be- 
ing determined. The virial overdensity criterion applied to find the 
(field) halo edges is determined from the cosmological model of 
the data though it can readily be tailored to specific needs; for 
the analysis presented here we used 200 x pcrit. For more de- 
tails on the mode of operation and actual fun ctionality we refe r the 
reader to the two code desc ription papers bv lGill etal.l(l2004l) and 
IKnollmann & Knebd fioosi) . respectively. 



2.2 ASOHF (Planelles & Quilis) 

The ASOHF finder JPlanelles & OuiUsl2oiol) is based on the spher- 
ical overdensity (SO) approach. Although it was originally cre- 
ated to be coupled to an Eulerian cosmological code, in its ac- 
tual version, it is a stand-alone halo finder capable of analysing the 
outputs from cosmological simulations including different compo- 
nents (i.e., dark matter, gas, and stars). The algorithm takes advan- 
tage of an AMR scheme to create a hierarchy of nested grids placed 
at different levels of refinement. All the grids at a certain level, 
named patches, share the same numerical resolution. The higher 
the level of refinement the better the numerical resolution, as the 
size of the numerical cells gets smaller. The refining criteria are 
open and can be chosen depending on the application. For a gen- 
eral purpose, ASOHF refines when the number of particles per cell 

AHF is freely available from http : / /www . popia . ft . uam . es /AMIGA 
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exceeds a user defined parameter. Once the refinement levels are 
set up, the algorithm applies the SO method independently at each 
of those levels. The parameters needed by the code are the follow- 
ing: i) the cosmological parameters when analysing cosmological 
simulations, ii) the size of the coarse cells, the maximum number of 
refinement levels (Nisveis), and the maximum number of patches 
(Npatch) for all levels in order to build up the AMR hierarchy of 
nested grids, iii) the number of particles per cell in order to choose 
the cells to be refined, and iv) the minimum number of particles in 
a halo. 

After this first step, the code naturally produces a tentative list 
of haloes of different sizes and masses. Moreover, a complete de- 
scription of the substructure (haloes within haloes) is obtained by 
applying the same procedure on the different levels of refinement. 
A second step, not using the cells but the particles within each halo, 
makes a more accurate study of each of the previously identified 
haloes. These prospective haloes (subhaloes) may include particles 
which are not physically bound. In order to remove unbound par- 
ticles, the local escape velocity is obtained at the position of each 
particle. To compute this velocity we integrate Poisson equation 
assuming spherical symmetry. If the velocity of a particle is higher 
than the escape velocity, the particle is assumed to be unbound and 
is therefore removed from the halo (subhalo) being considered. Fol- 
lowing this procedure, unbound particles are removed iteratively 
along a list of radially ordered particles until no more of them need 
to be removed. In the case that the number of remaining particles is 
less than a given threshold the halo is dropped from the list. 

After this cleaning procedure, all the relevant quantities for 
the haloes (subhaloes) as well as their evolutionary merger trees 
are computed. The lists of (bound) particles are used to calculate 
canonical properties of haloes (subhaloes) like the position of the 
halo centre, which is given by the centre of mass of all the bound 
particles, and the size of the haloes, given by the distance of the 
farthest bound particle to the centre. 

The ability of the ASOHF method to find haloes and their 
substructures is limited by the requirement that appropriate refine- 
ments of the computational grid exist with enough resolution to 
spot the structure being considered. In comparison to algorithms 
based on linking strategies, ASOHF does not require a linking 
length to be defined, although at a given level of refinement the size 
of the cell can be considered as the linking length of this particular 
resolution. 

The version of the code used in this comparison is serial, al- 
though there is already a first parallel version based on OpenMP. 

2.3 BDM (Klypin & Ceverino) 

T he Bound Density Maxima ( BDM) halo finder originally described 
in lKlvpin & HoltzmanI il997h uses a spherical 3D overdensity al- 
gorithm to identify haloes and subhaloes. It starts by finding the 
local density at each individual particle position. This density is 
defined using a top-hat filter with a constant number of particles 
A'flitcr, which typically is A^'aiter = 20. The code finds all maxima 
of density, and for each maximum it finds a sphere containing a 
given overdensity mass Ma = (47r/3) Apcri?A' where per is the 
critical density and A is the specified overdensity. 

For the identification of distinct haloes, the code uses the den- 
sity maxima as halo centres; amongst overlapping sphere the code 
finds the one that has the deepest gravitational potential. Haloes are 
ranked by their (preliminary) size and their final radius and mass 
are derived by a procedure that guarantees smooth transition of 
properties of small haloes when they fall into a larger host) halo 



becoming subhaloes: this procedure either assigns Ra or i?dist as 
the radius for a cuiTently infalling halo as its radius depending on 
the environmental conditions, where i?dist measures the distance 
of the infalling halo to the surface of the soon-to-be host halo. 

The identification of subhaloes is a more complicated proce- 
dure: centres of subhaloes are certainly density maxima, but not all 
density maxima are centres of subhaloes. BDM eliminates all den- 
sity maxima from the list of subhalo candidates which have less 
than TVflitcr self-bound particles. For the remaining set of prospec- 
tive subhaloes the radii are determined as the minimum of the fol- 
lowing three distances: (a) the distance to the nearest barrier point 
(i.e. centres of previously defined (sub-)haloes), (b) the distance 
to its most remote bound particle, and (c) the truncation radius (i.e. 
the radius at which the average density of bound particles has an in- 
flection point). This evaluation involves an iterative procedure for 
removing unbound particles and starts with the largest density max- 
imum. 

The unbinding procedure requires the evaluation of the gravi- 
tational potential which is found by first finding the mass in spheri- 
cal shells and then by integration of the mass profile. The binning is 
done in log radius with a very small bin size of A log(i?) — 0.005. 

The bulk velocity of either a distinct halo or a subhalo is de- 
fined as the average velocity of the 30 most bound particles of that 
halo or by all particles, if the number of particles is less than 30. 
The number 30 is a compromise between the desire to use only the 
central (sub)halo region for the bulk velocity and the noise level. 

The code uses a domain decomposition for MPI paralleliza- 
tion and OpenMP for the parallelization inside each domain. 

2.4 pSO (Sutter & Ricker) 

The parallel spherical overdensity (pSO) halo finder is a fast, highly 
scalable MPI-parallelized tool directly integrated into the FLASH 
simulation code that is designed to provide on-the-fly halo find- 
ing for use in subgrid modeling, merger tree analysis, and adaptive 
refinement schemes jSutter & RickerllzOIOl) . The pSO algorithm 
identifies haloes by growing SO spheres. There are four adjustable 
parameters, controlling the desired overdensity criteria for centre 
detection and halo size, the minimum allowed halo size, and the 
resolution of the halo radii relative to the grid resolution. The algo- 
rithm discovers halo centres by mapping dark matter particles onto 
the simulation mesh and selecting cell centres where the cell den- 
sity is greater than the given overdensity criterion. The algorithm 
then determines the halo edge using the SO radius by collecting 
particles using the FLASH AMR tree hierarchy. The algorithm de- 
termines the halo centre, bulk velocity, mass, and velocity disper- 
sion without additional post-processing. pSO is provided as both 
an API for use in-code and as a stand-alone halo finder. 

2.5 LANL (Lukic, Fasel & Hsu) 

The LANL halo finder is developed to provide on-the-fly halo anal- 
ysis for simulations utilizing h undreds of billions of particles, and 
is integrated into the MC' code l lHabib et alj|2009l) . although it can 
also be used as a stand-alone halo finder. Its core is a fast A;D-tree 
FOF halo finder which uses 3D (block), structured decomposition 
to minimize surface to volume ratio of the domain assigned to each 
process. As it is aimed at large-scale structure simulations (lOO-l- 
Mpc/h on the side), where the size of any single halo is much 
smaller than the size of the whole box, it uses the concept of "ghost 
zones" such that each process gets all the particles inside its do- 
main as well as those particles which are around the domain within 
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a given distance (the overload size, a code parameter chosen to be 
larger then the size of the biggest halo we expect in the simula- 
tion). After each process runs its serial version of a FOF finder, 
MPI based "halo stitching" is performed to ensure that every halo 
is accounted for, and accounted for only once. 

If desired, spherical "SO" halo properties can be found using 
the FOF haloes as a proxy. Those SO haloes are centred at the parti- 
cle with the lowest gravitational potential, while the edge is at _Ra 
- the radius enclosing an overdensity of A. It is well known that 
percolation based FOF haloes suffer from the over-bridging prob- 
lem; therefore, if we want to ensure completeness of our SO sample 
we should run FOF with a smaller linking length than usual in or- 
der to capture all density peaks, but still avoid over-bridging at the 
scale of interest (which depends on our choice of A). Overlapping 
SO haloes are permitted, but the centre of one halo may not reside 
inside another SO halo (that would be considered as a substructure, 
rather than a "main" halo). The physical code parameters are the 
linking length for the FOF haloes, and overdensity parameter A 
for SO haloes. Technical parameters are the overload size and the 
minimum number of particles in a halo. 

The LANL halo finder is being included in the standard dis- 
tributions of PARAVIEvlfl package, enabling researchers to com- 
bine analysis and visualization of their simulations. A substructure 
finder is cuiTently under development. 

2.6 SUBFIND (lannuzzi , Springel & Dolag) 

SUBFIND l lSpringel et al.ll200lh identifies gravitationally bound, 
locally overdense regions within an input parent halo, traditionally 
provided by a FOF group finder, although other group finders could 
be used in principle as well. The densities are estimated based on 
the initial set of all particles via adaptive kernel interpolation based 
on a number A^dcns of smoothing neighbours. For each particle, the 
nearest A^ngb neighbours are then considered for identifying local 
overdensities through a topological approach that searches for sad- 
dle points in the isodensity contours within the global field of the 
halo. This is done in a top-down fashion, starting from the parti- 
cle with the highest associated density and adding particles with 
progressively lower densities in turn. If a particle has only denser 
neighbours in a single structure it is added to this region. If it is 
isolated it grows a new density peak, and if it has denser neigh- 
bours from two different structures, an isodensity contour that tra- 
verses a saddle point is identified. In the latter case, the two in- 
volved structures are joined and registered as candidate subhaloes 
if they contain at least A'^ngb particles. These candidates, selected 
according to the spatial distribution of particles only, are later pro- 
cessed for gravitational self-boundness. Particles with positive to- 
tal energy are iteratively dismissed until only bound particles re- 
main. The gravitational potential is computed with a tree algorithm, 
such that large haloes can be processed efficiently. If the remaining 
bound number of particles is at least A'^ngb, the candidate is ulti- 
mately recorded as a subhalo. The set of initial substructure can- 
didates forms a nested hierarchy that is processed from inside out, 
allowing the detection of substructures within substructures. How- 
ever, a given particle may only become a member of one substruc- 
ture, i.e. SUBFIND decomposes the initial group into a set of dis- 
joint self-bound structures. Particles not bound to any genuine sub- 
structure are assigned to the "background halo". This component is 
also checked for self-boundness, so that some particles that are not 

^ http://www.paraview.org/ 



bound to any of the structures may remain. For all substructures 
as well as the main halo, the particle with the minimum gravita- 
tional potential is adopted as (sub)halo centre. For the main halo, 
SUBFIND additionally calculates a SO virial mass around this cen- 
tre, taking into account all particles in the simulation (i.e. not just 
those in the FOF group that is analyzed). There exist both serial 
and MPI-parallelized versions of SUBFIND, which implement the 
same underlyi ng algorithms. For mo re details we refer the reader 
to the paper bv lSpringel et al.l(l200lh . 



2.7 FOF (Gottlober & Turchaninov) 

In order to analyse large cosmological simulations with up to 2048'' 
particles we have developed a new MPI version of the hierarchical 
Friends-Of-Friends algorithm with low memory requests. It allows 
us to construct very fast clusters of particles at any overdensity 
(represented by the linking length) and to deduce the progenitor- 
descendant-relationship for clusters in any two different time steps. 
The particles in a simulation can consist of different species (dark 
matter, gas, stars) of different mass. We consider them as an undi- 
rected graph with positive weights, namely the lengths of the seg- 
ments of this graph. For simplicity we assume that all weights are 
different. Then one can show that a unique minimum spanning tree 
(MST) of the point distribution exists, namely the shortest graph 
which connects all points. If subgraphs cover the graph then the 
MST of the graph belongs to the union of MSTs of the subgraphs. 
Thus subgraphs can be constructed in parallel. Moreover, the geo- 
metrical features of the clusters, namely the fact that they occupy 
mainly almost non-overlapping volumes, allow the construction of 
fast parallel algorithms. If the MST has been constructed all pos- 
sible clusters at all linking lengths can be easily determined. To 
represent the output data we apply topological sorting to the set of 
clusters which results in a cluster ordered sequence. Every cluster 
at any linking length is a segment of this sequence. It contains the 
distances between adjacent clusters. Note, that for the given MST 
there exist many cluster ordered sequences which differ in the or- 
der of the clusters but yield the same set of clusters at a desired 
linking length. If the set of particle-clusters has been constructed 
further properties (centre of mass, velocity, shape, angular momen- 
tum, orientation etc.) can be directly calculated. Since this concept 
is by construction aspherical a circular velocity (as used to char- 
acterise objects found with spherical overdensity algorithms) can- 
not be determined here. The progenitor-descendant-relationship is 
calculated for the complete set of particles by comparison of the 
cluster-ordered sequences at two different output times. 

The hierarchical FOF algorithm identifies objects at different 
overd ensities depending on the chosen linking length jMore et afl 
l20Ilh . In order to avoid artificial misidentifications of subhaloes 
on high overdensities one can add an additional criterion. Here we 
have chosen the requirement that the spin parameter of the subhalo 
should be smaller than one. All subhaloes have been identified at 
512 times the virial overdensity. Thus only the highest density peak 
has been taken into account for the mass determination and the size 
of the object, which are therefore underestimated. The velocity of 
the density peak is estimated correctly but without removing un- 
bound particles. 



© 2010 RAS, MNRAS 000.[Tll27l 



8 Knebe et al. 



2.8 pFOF (Rasera & Roy) 

Parallel FOF (pFOF) is a MPI-based parallel Friends-of-Friends 
halo finder which is used within the DEUS Consortium|f| at LUTH 
(Laboratory Universe and Theories). It has been parallelized by 
Roy and was u sed for several studie s involving large Ai'- body simu- 
lations such as ICourtin et al1 l l2010h : lRa"sera et al.lhoiOl) . The prin- 
ciple is the following: first, particles are distributed in cubic subvol- 
umes of the simulation and each processor deals with one "cube", 
and runs Friends-of-Friends locally. Then, if a structure is located 
close to the edge of a cube, pFOF checks if there are particles be- 
longing to the same halo in the neighbouring cube. This process 
is done iteratively until all haloes extending across multiple cubes 
have been merged. Finally, particles are sorted on a per halo basis, 
and the code writes two kinds of output: particles sorted per region, 
particles sorted per halo. This makes any post-processing straight- 
forward because each halo or region can be analysed individually 
on a single CPU server. pFOF was successfully tested on up to 
4096 Bluegene/P cores with a 2048"^ particles A^-body simulation. 
In this article, the serial version was used for mock haloes and small 
cosmological simulations, and the parallel version f or larger runs. 
The l inking length was set to fe = 0.2 (however see lCourtin et al.l 
l201Cl for a discussion on the halo definition), and the minimum 
halo mass to 100 particles. And the halo centres reported here are 
the centre-of-mass of the respective particle distribution. 



2.9 Ntropy-fofsv (Gardner, McBride & Stinson) 

The Ntropy parallel programming framework is derived from N- 
body codes to help address a broad range of astrophysical problems 
0. This includes an implementation of a simple but efficient FOF 
halo finder, Ntropy -fo fsv, which is more f ully described in 
[Gardner et all ( l2007al) and lCardner et al.l(l2007bb . Ntropy provides 
a "distributed shared memory" (DSM) implementation of a fcD- 
tree, where the application developer can reference tree nodes as if 
they exist in a global address space, even though they are physically 
distributed across many compute nodes. Ntropy uses the fcD-tree 
data structures to speed up the FOF distance searches. It also em- 
ploys an implementation of the lShiloach & VishkinI ( Il982h parallel 
connectivity algorithm to link together the haloes that span separate 
processor domains. The advantage of this method is that no single 
computer node requires knowledge of all of the groups in the simu- 
lation volume, meaning that Ntropy-fofsv is scalable to petas- 
cale platforms and handle large data input. This algorithm was used 
in the mock halo test cases to stitch together particle groups found 
across many threads into the one main FOF halo. As FOF is a deter- 
ministic algorithm, Ntropy-fofsv takes a single physical link- 
ing length to group particles into FOF haloes without any perform- 
ing particle unbinding or subhalo identification. The halo centres 
for the analysis presented here use centre-of-mass estimates based 
on the FOF particle list. Ntropy achieves parallelisation by call- 
ing "machine dependent library" (MDL) that consists of high-level 
operations such as "acquire_treenode" or "acquire_particle." This 
library is rewritten for a variety of models (MPI, POSIX Threads, 
Cray SHMEM, etc.), allowing the framework to extract the best 
performance from any parallel architecture on which it is run. 



www.deus-consortium.org 
^ http://www.phys.washington.edu/users/gardnerj/ntropy 



2.10 VOBOZ (Neyrinck) 

Conc eptually, a VOBOZ (VOronoi BOund Zones. iNevrinck et al.l 
l2005h halo or subhalo is a density peak surrounded by gravitation- 
ally bound particles that are down steepest-density gradients from 
the peak. A statistical significance is measured for each (sub)halo, 
based on the probability that Poisson noise would produce it. 

The only physical parameter in VOBOZ is the density thresh- 
old characterizing the edge of (parent) haloes (set to 200 times 
the mean density here), which typically only affects their mea- 
sured masses. To return a definite halo catalog, we also impose 
a statistical-significance threshold (set to A-cr here), although de- 
pending on the goal of a study, this may not be necessary. 

Density peaks are found using a Voronoi tessellation (par- 
allelizable by splitting up the volume), which gives an adap- 
tive, parameter- free estimate of each particle's d ensity and set of 
neighbours (e.g. lSchaap & van de Wevgaertl2000l) . Each particle is 
joined to the peak particle (whose position is returned as the halo 
centre) that lies up the steepest density gradient from that particle. 
A halo associated with a high density peak will also contain smaller 
density peaks. The significance of a halo is judged according to the 
ratio of its central density to a saddle point joining the halo to a halo 
with a higher central density, comparing to a Poisson point process. 
Pre-unbinding (sub)halo boundaries are defined along these density 
ridges. 

Unbinding evaporates many spurious haloes, and often brings 
other halo boundaries inward a bit, reducing the dependence on the 
outer density contrast. Particles not gravitationally bound to each 
halo are removed iteratively, by comparing their potential energies 
(measured as sums over all other particles) to kinetic energies with 
respect to the velocity centroid of the halo's core (i.e. the particles 
that directly jump up density gradients to the peak). The unbinding 
is parallelized using OpenMP. In the cosmological test, we remove 
haloes with fewer than 20 particles from the VOBOZ halo list. 



2.11 ORIGAMI (Falck, Neyrinck & Aragon-Calvo) 

ORIG AMI (Order-Reve rsing Gravity, Apprehended Mangling In- 
dices, iFalck etalj|20Tlh uses a natural, parameter-free definition 
of the boundary between haloes and the non-halo environment 
around them: halo particles are particles that have experienced 
shell-crossing. This dynamical definition does not make use of the 
density field, in which the boundary can be quite ambiguous. In one 
dimension, shell crossings can be detected by looking for pairs of 
particles whose positions are out-of-order compared with their ini- 
tial positions. In 3D, then, a halo particle is defined as a particle that 
has undergone shell crossings along 3 orthogonal axes. Similarly, 
this would be 2 axes for a filament, I for a wall, and for a void. 
There is a huge number of possible sets of orthogonal axes in the 
initial grid to use to test for shell-crossing, but we only used four 
simple ones, which typically suffice to catch all the shell-crossings. 
We used the Cartesian x, y, and z axes, as well as the three sets of 
axes consisting of one Cartesian axis and two (45° ) diagonal axes 
in the plane perpendicular to it. 

Once halo particles have been tagged, there are many possible 
ways of grouping them into haloes. For this paper, we grouped them 
on a Voronoi tessellation of final-condit ions particle positions. This 
gives a natural density estimate (e.g. ISchaap & van de Wevgaerll 
l200Ct VTFE, Voronoi Tessellation Field Estimator) and set of 
neighbours for each particle. Haloes are sets of halo particles con- 
nected to each other on the Voronoi tessellation. To prevent haloes 
from being unduly linked, we additionally require that a halo con- 
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tain at most one halo "core", defined as a set of particles connected 
on the tessellation that all exceed a VTFE density threshold. This 
density threshold is the only parameter in our algorithm, since the 
initial tagging of halo particles is parameter-free; for this study, we 
set it to 200 times the mean density. We partition connected groups 
of halo particles with multiple cores into haloes as follows: each 
core iteratively collects particles in concentric rings of Voronoi 
neighbours until all halo particles are associated. The tagging pro- 
cedure establishes halo boundaries, so no unbinding procedure is 
necessary. Also, we note that currently, the algorithm does not iden- 
tify subhaloes. We remove haloes with fewer than 20 particles from 
the ORIGAMI halo catalogue, and the halo centre reported is the 
position of the halo's highest-density particle. 

Please note that due to its nature ORIGAMI is only applicable 
to cosmological simulations and hence only enters the comparison 
project in the respective Section \4~2\ 



2.12 SKID (Stadel & Potter) 

S KID (Spline Ker n el Int erpolative Denmaxfl first mentioned 
Govemato et al.l ( 1 19971) and extensively described in IStadell 



( l200ll) . finds density peaks within A'^-body simulations and sub- 
sequently determines all associated bound particles thereby iden- 
tifying haloes. It is important to stress that SKID will only find 
the smallest scale haloes within a hierarchy of haloes as is gener- 
ally seen in cosmol ogical structure formation simulations. Unlike 
original DENMAX jBertschinger & Gelblll 99 lUOelbll 19921) which 
used a fixed grid based density estimator, SKID uses SPH (i.e., 
smoothed particle hydrodynamics) kernel averaged densities which 
are much better suited to the Lagrangian nature of A^-body simu- 
lations and allow the method to locally adapt to the large dynamic 
range found in cosmological simulations. 

Particles are slowly slid (each step moving the particles by 
a distance of order the softening length in the simulation) along 
the local density gradient until they pool at a maximum, each pool 
corresponding to each initial group. This first phase of SKID can 
be computationally very expensive for large simulations, but is also 
quite robust. 

Each pool is then "unbound" by iteratively evaluating the 
binding energy of every particle in their original positions and then 
removing the most non-bound particle until only bound particles 
remain. This removes all particles that are not part of substructure 
either because they are part of larger scale structure or bee ause they 
are part of the background. 

SKID can also identify structure composed of gas and stars 
in hydrodynamical simulations using the dark matter only for its 
gravitational binding effect. The "Haloes going MAD" meeting has 
motivated development of an improved version of the algorithm 
capable of also running on parallel computers. 



2.13 AdaptaHOP (Tweed & Colombi) 

The co de AdaptaHOP is described in Appendix A of lAubert et al.l 
( |2004|) . The first step is to compute an SPH density for each particle 
from the 20 closest neighbours. Isolated haloes are then described 
as groups of particles above a density threshold pt, where this pa- 
rameter is set to 80, which closely matches results of a FOE group 
finder with parameter b — 0.2. To identify subhaloes within those 

^ The OpenMP parallelized version of SKID can be freely downloaded 
from http : //www . hpcf orge . org 



groups, local density maxima and saddle points are detected. Then, 
by increasing the density threshold, it is a simple matter to decom- 
pose haloes into nodes that are either density maxima, or groups of 
particles whose density is between two values of saddle points. A 
node structure tree is then created to detail the whole structure of 
the halo itself. Each leaf of this tree is a local density maximum and 
can be interpreted as a subhalo. However, further post-processing 
is needed to define the halo structure tree, describing the host halo 
itself, its subhaloes an d subhaloes within subhaloes. This part of 
the code is detailed in iTweed et al.l ( |2009|) : the halo structure tree 
is constructed so that the halo itself contains the most massive lo- 
cal maximum (Most massive Sub maxima Method: MSM). This 
method gives the best result for isolated snapshots, as used in this 
paper. 

In more detail, AdaptaHOP needs a set of seven parameters. 
The first parameter is the number of neighbours 7i„ei used with a 
fcD-tree scheme in order to estimate the SPH density. Among these 
n„ei neighbours, the rihop closest are used to sweep through the 
density field and detect both density maxima and saddle points. 
As previously mentioned, the parameter pt sets the halo bound- 
ary. The decomposition of the halo itself into leaves that are to 
be redefined as subhaloes has to fulfil certain criteria set by the 
remaining four parameters. The most relevant is the statistical 
significance threshold, set via the parameter fudge, defined via 
{{p) — pt)/ pt > fudgej \fN , where N is the number of particles 
in the leaves. The minimal mass of a halo is limited by the param- 
eter Timembers, the minimum number of particles in a halo. Any 
potential subhalo has also to respect two conditions with respect to 
the density profile and the minimal radius, through the parameters 
Q and /e. These two values ensure that a subhalo has a maximal 
density pmai! such as pmai > q(p) and a radius greater than 
times the mean interparticle distance. We used the following set of 
parameters (n„ei = rihop = 20, pt = 80, fudge — 4, a — 1, 
ft = 0.05, nmiimbers ~ 20). It is important to understand that all 
nodes are treated as leaves and must comply with aforementioned 
criteria before being further decomposed into separate structures. 
As for defining haloes and subhaloes themselves, this is done by 
grouping linked lists of particles coiTesponding to different nodes 
and leaves from the node structure tree. Eurther, the halo and sub- 
halo centres are defined as the position of the particle with the high- 
est density. The halo edge corresponds to the pt density threshold, 
whereas the saddle points define the subhalo edge. 

Please note that AdaptaHOP is a mere topological code that 
does not feature an unbinding procedure. Eor substructures (whose 
boundaries are chosen from the saddle point value) this may impact 
on the estimate of the mass as well as lead to contamination by host 
particles. 



2.14 HOT (Ascasibar) 

This algorithm, still under development, computes the Hierarchi- 
cal Overdensity Tree, (HOT), of a point distribution in an arbitrary 
multidimensional space. HOT is introduced as an alternative to the 
minimal spanning tree (MST) for spaces where a metric is not well 
defined, like the phase space of particle positions and velocities. 

The method is based on the Eield Estimator for Arbitrar y 
Spaces (EiEstAS, lAscasibar & BinnevI l2005l : lAscasibad I2OI0I) . 
Eirst, the space is tessellated one dimension at a time, until it is 
divided into a set of hypercubical cells containing exactly one par- 
ticle. Particles in adjacent cells are considered as neighbours. Then, 
the mass of each point is distributed over an adaptive smoothing 
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kernel as described in lAscasibadteOlOl) , which provides a key step 
in order to define a metric. 

In the HOT+FiEstAS scheme, objects coixespond to the 
peaks of the density field, and their boundaries are set by the iso- 
density contours at the saddle points. At each saddle point, the ob- 
ject containing less particles is attached to the most massive one, 
which may then be incorporated into even more massive objects 
in the hierarchy. This idea can be implemented by computing the 
MST of the data distribution, defining the distance between two 
neighbouring particles as the minimum density along an edge con- 
necting them (i.e. the smallest of the two densities, or the density 
of the saddle point when it exists). However, this is not practical 
for two reasons. Firstly, defining a path between two particles is 
not trivial when a metric is not available. Secondly, finding the sad- 
dle points would require a minimisation along the path, which is 
extremely time consuming when a large number of particles is in- 
volved. These problems may be overcome if the distance between 
two data points is given by the average density within the hyperbox 
they define. 

Once the distances are defined in this way, HOT+FiEstAS 
computes the MST of the data distribution by means of Kmskal's 
algorithm ( I Kruskall 119561) . The output of the algorithm consists of 
the tree structure, given by the parent of each data point in HOT, 
and a catalogue containing an estimate of the centroid (given by the 
density-weighted centre of mass) as well as the number of particles 
in the object (both including and excluding substructures). In or- 
der to discard spurious density fluctuations, a minimum number of 
points and density contrast are required for an object to be output to 
the catalogue. Currently, these parameters are set to A'^ > 20 par- 
ticles and a contrast threshold Ppcak/pbackground > 5. Although 
these values seem to yield reasonable results, more experimenta- 
tion is clearly needed. 

In this work, the algorithm is applied to the particle posi- 
tions only (HOT 3D) as well as the full set of phase-space coordi- 
nates (H0T6D). Since it is intended as a general data analysis tool, 
not particularly optimised for the problem of halo identification, it 
should not (and does not) take into account any problem-specific 
knowledge such as the concepts of binding energy or virial radius. 
The latter quantity, as well as the maximum circular velocity, have 
been computed from the raw particle IDs returned by the code. 

The definition of object boundaries in terms of the saddle 
points of the density field will have a relatively mild impact in the 
results concerning the mock haloes, but it is extremely important 
in the cosmological case. HOT-l-FiEstAS will, for instance, iden- 
tify large-scale filamentary structures that are not considered haloes 
by any of the other algorithms (although many of these objects are 
indeed gravitationally bound). 

On the other hand, keeping unbound particles will be an issue 
for subhaloes close to the centre of their host, especially in three 
dimensions, and a post-processing script will be developed to per- 
form this task. 

Please note that due to its present implementation HOT is not 
yet applicable to cosmological simulations and hence only enters 
the comparison project in the mock halo Section \4~l\ 

2.15 HSF (Maciejewski) 

The Hierarchical Structure Finder (HSF. [Maciejewski et al.|[2009l) 
identifies objects as connected self-bound particle sets above some 

^ H0T3D does not even read particle velocities 



density threshold. This method consists of two steps. Each particle 
is first linked to a local DM phase-space density maximum by fol- 
lowing the gradient of a particle-based estimate of the underlying 
DM phase-space density field. The particle set attached to a given 
maximum defines a candidate structure. In a second step, particles 
which are gravitationally unbound to the structure are discarded 
until a fully self-bound final object is obtained. 

In the initial step the phase-space density and phase-space gra- 
dients are estimated by using a six-dimensional SPH smoothing 
kerne l with a local adaptive metr ic as implemented in the EnBiD 
code jSharma & Steinmetj2006l) . For the SPH kernel we use iVsph 
between 20 and 64 neighbours whereas for the gradient estimate 
we use A'ngb = 20 neighbours. 

Once phase-space densities have been calculated, we sort the 
particles according to their density in descending order. Then we 
start to grow structures from high to low phase-space densities. 
While walking down in density we mark for each particle the two 
closest (according to the local phase-space metric) neighbours with 
higher phase-space density, if such particles exist. In this way we 
grow disjoint structures until we encounter a saddle point, which 
can be identified by observing the two marked particles and see- 
ing if they belong to different structures. A saddle point occurs at 
the border of two structures. According to each structure mass, all 
the particles below this saddle point can be attached to only one 
of the structures if it is significantly more massive than the other 
one, or redistributed between both structures if they have compa- 
rable masses. This is controlled by a simple but robust cut or grow 
criterion depending on a connectivity parameter a which is rang- 
ing from 0.2 up to 1.0. In addition, we test on each saddle point 
if structures are statistically significant when compared to Poisson 
noise (controlled by a /3 parameter). At the end of this process, we 
obtain a hierarchical tree of structures. 

In the last step we check each structure against an unbind- 
ing criterion. Once we have marked its more massive partner for 
each structure, we sort them recursively such that the larger part- 
ners (parents) are always after the smaller ones (children) . Then we 
unbind structure after structure from children to parents and add 
unbound particles to the larger partner. If the structure has less than 
TVcut = 20 particles after the unbinding process, then we mark it 
as not bound and attach all its particles to its more massive partner 
(note, that a smaller A^cut is used for the resolution study in Sec- 
tion l4.1.4t . The most bound particle of each halo/subhalo defines 
its position centre. 

Although HSF can be used on the entire volume, to speed up 
the process of identification of the structures in the cosmological 
simulation volume we first apply the FOF method to disjoint the 
particles into smaller FOF groups. 

2.16 6DFOF (Zemp & Diemand) 

6DF0F is a simple extension of the well known FOF method which 
also includes a proximity condition in velocity space. Since the cen- 
tres of all resolved haloes and subhaloes reach a similar peak phase 
space density they can all be found at once with 6D F0F. The al- 
gorithm was first presented in lOiemand et al.l (|2006|) . The 6DF0F 
algorithm links two particles if the following condition 

(^^i^ + (^^i^ < 1 (1) 

is fulfilled. There are three free parameters: Ax, the linking length 
in position space, Aii, the linking length in velocity space, and 
A^'min, the minimum number of particles in a linked group so that 
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it will be accepted. For Ai) — >■ oo it reduces to the standard FOF 
scheme. The 6DF0F algorithm is used for finding the phase space 
coordinates of the high phase space density cores of haloes on all 
levels of the hierarchy and is fully integrated in parallel with in the 
MPI and OpenMP parallelised code PKDGRAV ( IStadell200ll) . 

The centre position and velocity of a halo are then determined 
from the linked particles of that halo. For the centre position of 
a halo, one can choose between the following three types: 1) the 
centre-of-mass of its linked particles, 2) the position of the particle 
with the largest absolute value of the potential among its linked 
particles or 3) the position of the particle which has the largest local 
mass density among its linked particles. For the analysis presented 
here, we chose type 3) as our halo centre position definition. The 
centre velocity of a halo is calculated as the centre-of-mass velocity 
of its linked particles. Since in 5DF0F only the particles with a high 
phase space density in the very centre of each halo (or subhalo) are 
linked together, it explains the somewhat different halo velocities 
(compared to the other halo finders) and slightly offset centres in 
cases only a few particles were linked. 

Other properties of interest (e.g. mass, size or maximum of 
the circular velocity curve) and the hierarchy level of the individual 
haloes are then detemined by a separate profiling routine in a post 
processing step. For example, a characteristic size and mass scale 
definition (e.g. r2ooc and M2ooc) for field haloes based on tradi- 
tional spherical overdensity criteria can be specified by the user. 
For subhaloes, a truncation scale can be estimated as the location 
where the mass density profile reaches a user specified slope. Dur- 
ing the profiling step no unbinding procedure is performed. Hence, 
the profiling step does not base its (sub-)halo properties upon parti- 
cle lists but rather on spherical density profiles. Therefore, 6DF0F 
directly returned halo properties instead of the (requested) particle 
ID Usts. 



2.17 Rockstar (Behroozi) 

Rockstar is a new phase-space based halo finder designed to 
maximize halo consistency across timesteps; as such, it is espe- 
cially useful for studying merger trees and halo evolution (Behroozi 
et al. in prep.). Rockstar first selects particle groups with a 3D 
Friends-of-Friends variant with a very large linking length (b — 
0.28). For each main FOF group, Rockstar builds a hierarchy 
of FOF subgroups in phase space by progressively and adaptively 
reducing the linking length, so that a tunable fraction (70%, for this 
analysis) of particles are captured at each subgroup as compared 
to the immediate parent group. For each subgroup, the phase- space 
metric is renormalized by the standard deviations of particle posi- 
tion and velocity. That is, for two particles pi and p2 in a given 
subgroup, the distance metric is defined as: 

d{pi,p2)^(- 5-^ + ^ r^i ' (2) 



phase-space metric is set by halo properties, so that the distance 
between a halo h and a particle p is defined as: 



d{h,p) = 



where a^^ and are the particle position and velocity dispersions 
for the given subgroup. This metric ensures an adaptive selection 
of overdensities at each successive level of the FOF hierarchy. 

When this is complete, Rockstar converts FOF subgroups 
into haloes beginning at the deepest level of the hierarchy. For a 
subgroup without any further sublevels, all the particles are as- 
signed to a single seed halo. If the parent group has no other sub- 
groups, then all the particles in the parent group are assigned to the 
same seed halo as the subgroup. However, if the parent group has 
multiple subgroups, then particles are assigned to the subgroups' 
seed haloes based on their phase-space proximity. In this case, the 



\2 / \2\ 1/2 



(3) 



where rvir is the current virial radius of the seed halo and ct„ is the 
current particle velocity dispersion. This process is repeated at all 
levels of the hierarchy until all particles in the base FOF group have 
been assigned to haloes. Unbinding is performed using the full par- 
ticle potentials (calcu lated using a modified Barnes & Hut method, 
iBames & Huj (Il98d)): halo centres are defined by averaging parti- 
cle positions at the FOF hierarchy level which yields the minimum 
estimated Poisson error — which in practice amounts to averaging 
positions in a small region close to the phase-space density peak. 
For further details about the unbinding process and for details about 
accurate calculation of halo properties, please see Behroozi et al. in 
prep. 

Rockstar is a massively parallel code (hybrid 
OpenMP/MPI style); it can already run on up to 10^ CPUs 
and on the very largest simulations (> 10^" particles). Addi- 
tionally, it is very efficient, requiring only 56 bytes of memory 
per particle and 4-8 (total) CPU hours per billion particles in a 
simulation snapshot. The code is in the final stages of development; 
as such, the results in this paper are a minimum threshold for the 
performance and accuracy of the final version|f| 



3 THE DATA 

In order to study, quantify, and assess the differences between var- 
ious halo finding techniques we first have to define a unique set of 
test cases. In that regard we decided to split the suite of compar- 
isons into two major parts: 

• a well-defined mock haloes consisting of field haloes in isola- 
tion as well as (sub-)subhaloes embedded within the density back- 
ground of larger entities, and 

• a state-of-the-art cosmological simulation primarily focusing 
on the large-scale structure. 

We further restricted ourselves to analysing dark matter only 
data sets as the inclusion of baryons (especially gas and its addi- 
tional physics) will most certainly complicate the issue of halo find- 
ing. As most of the codes participating in this comparison project 
do not consider gas physics in the process of object identification 
we settled for postponing such a comparison to a later study. 

We further adopted the following strategy for the comparison. 
For the mock haloes each code was asked to return a list of parti- 
cles and the centre of the (sub-)halo as derived from applying the 
halo finder to the respective data set. These centres and particle lists 
were then post-processed by one single code deriving all the quanti- 
ties studied below. By this approach we aimed at homogenising the 
comparison and eliminating subtle code-to-code variations during 
the analysis process. However, we also need to acknowledge that 
not all codes complied with this request as they were not designed 
to return particle lists; those codes nevertheless provided the halo 
properties in question and are included in the comparison. 

For the comparison of the cosmological simulations each code 
merely had to return those halo properties to be studied, based upon 

* Those interested in obtaining a copy of the code as well as a draft of the 
paper should contact the author at behroozi@stanford.edu. Current accept- 
able input formats for simulation files are ART, GADGET-2, and ASCII. 
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each and every code individually. The idea was to compare the ac- 
tual performance of the codes in a realistic set-up without interfer- 
ence in the identification/analysis process. 

3.1 Mock Haloes 

In order to be able to best quantify any differences in the results re- 
turned by the different halo finders it is best to construct test scenar- 
ios for which the correct answer is known in advance. Even though 
we primarily aim at comparing fmax and the number of gravitation- 
ally bound particles we also want to have full control over various 
definitions of, for instance, virial mass, i.e. we require haloes whose 
density profile is well known. Additionally, as subhalo detection 
is of prime interest in state-of-the-art cosmological simulations we 
also place haloes within haloes within haloes etc. Further, sampling 
a given density profile with particles also gives us the flexibility to 
study resolution effects related to the number of particles actually 
used. 

We primarily used the functional form for the (dark matter) 
density profile of haloes orig inally proposed in a series of pa pers 
by Navarro, Frenk & White jNavarro et al.lll995l Il996l Il997l the 
so-called "NFW profile"), 

Pcrit r/rs(l-f r/rs)2 ' 
where pcrit is the critical density of the universe, is the scale 
radius and Sc is the characteristic density. NFW haloes are charac- 
terised by their mass for a given enclosed overdensity. 



(4) 



. ^ 47r 3 . 
Ma = — rAApcrit, 



(5) 



where A is a multiple of the critical density that defines the mag- 
nitude of the overdensity and is the radius at which this occurs. 
The characteristic density is then defined as, 



(6) 



3 ln(l + c)-c/(l + c)' 

where c = rA/rs is the concentration. The mock haloes were 
generated with using a predefined number of particles that repro- 
duced the NFW profile even though the consensus has moved away 
from the statement that dark matter haloes follow this particular 
profile all the way down to the centre. We are not interested in 
probing those very central regions where the density profile starts 
to deviate from the NFW form as found nowadays in c osmologi- 
cal simulations jStadel et al1l2009l : iNavarro et al1l2010l) . We need 
to stress that the position and size of the maximum of the rotation 
curve is in fact unaffected in all tests presented here. The velocities 
of the p articles were then assig ned using the velocity dispersion 
given m iLokas & MamorJ (1200 1[) and distributed using a Maxwell- 
Boltzmann function i HemquisJ 19931) PI 

In addition to mock haloes whose density profile is based upon 
the findings in cosmological simulations (at least down to those 
scales probed here) we also chose t o generate test haloes that follow 
a Plummer profile l |PlummeJll91 ih . 



(7) 



where M is the total mass and is the scale radius. The mock 
haloes were then produced again using a predefined number of par- 
ticles to reproduce the profile, but this time the velocities were ob- 
taine d using an isotropic, spheri cally symmetric distribution func- 
tion teirmev & Tremain3 Il987l) . The two major differences be- 
tween the Plummer and the NFW density profile are that for the 
former profile the mass converges and it contains a well defined 
constant-density core. This constant density may pose problems for 
halo finders as most of them rely on identifying peaks in the density 
field as (potential) sites for dark matter haloes. We stress that the 
Plummer spheres are intended as academic problems with no ob- 
served counter-part in cosmological simulations and hence only to 
be taken lightly and for information purposes; they may be viewed 
as a stability test for halo finders and as a trial how sensitive halo 
characteristics are against precise measurements of the centre. We 
will see that some properties can still be stably recovered even if an 
incorrect determination of the Plummer halo centre is made. 

As we also plan to study the accurate recovery of substructure 
we generated setups where one (or multiple) subhaloes are embed- 
ded within the density profile of a larger host halo. To this end we 
generate, for instance, two haloes in isolation: one of them (the 
more massive one) will then serve as the host whereas the lighter 
one will be placed inside at a known distance to the centre of its host 
and with a certain (bulk) velocity. The concentrations (i.e. the ratio 
between the virial and the scale radius) have been c hosen in order 
to me et the findings of cosmological simulations (e.g. lBullock et al.l 
l200lh . All our mock haloes are set-up with fully sampled 6D initial 
phase space distributions and every halo (irrespective of it becom- 
ing a host or a subhalo) has been evolved in isolation for several 
Gyrs in order to guarantee equilibrium. The mass of all particles in 
both the host halo and the subhalo are identical and all haloes have 
been sampled with particles out to 2 x i?ioo where i?ioo marks the 
point where the density drops below 100 x pcrit. For more details of 
the procedure and th e generation of the NFW haloes we would like 
to refer the reader to iMuldrew et al. ( 201 lb and for the generation 
of the Plummer spheres to lRead et aL ( |2006|) . 

The characteristics of the haloes are summarised in Table [T] 
We are aware of the fact that even though the radius at which the 
enclosed overdensity reaches some defined level is well-defined for 
our subhaloes when they were generated in isolation, such a defini- 
tion becomes obsolete once they are placed inside a host. However, 
we nevertheless need to acknowledge that such a definition may 
serve as a fair basis for the comparisons of the recovery of subhalo 
properties amongst different halo finders. 

Further, placing an unmodified subhalo at an arbitrary radial 
distance within a parent halo is also in part an academic exercise. 
It neglects that "real" subhaloes will always be tidally truncated. 
In that regards, it is not realistic to have an extended/untruncated 
subhalo at small distances to the host's centre. Some halo finders 
(e.g. SUBFIND) rely on the tidal truncation in order to be able to 
avoid a very large radially dependent bias in the amount of mass 
that can be recovered for a subhalo. 

For each of the two types of density profile we generated the fol- 
lowing setups: 



® We are awai'e that the velocity distribution is not derived from the full 
distribution function a nd that the M axwell-Boltzmann distribution is only 
an approximation (cf. iKazantzidis et al. 2004; Zemp et al. 2008). Despite 
this, it will have no effect on the ability of halo finders to recover the haloes 
as has been shown in[Muldrew et al. l201j|) where also more details about 
the generation of the mock haloes can be found. 



(i) isolated host halo 



(ii) isolated host halo + subhalo at O.dRl 



(iii) isolated host halo -I- subhalo at 0.5i?i 
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Table 2. The properties of the subhaloes for the NFW resolution study 
presented in Section |4. 1 .41 Radii are given in kpc, and velocities in 
km/sec. 



-'Vioo -'Vtot -RlOO fmax -Rvmax 



10 


13 


20.41 


18.24 


3.68 


20 


27 


25.72 


22.99 


4.62 


30 


41 


29.44 


26.31 


5.30 


40 


55 


32.40 


28.96 


5.85 


50 


68 


34.90 


31.20 


6.30 


100 


137 


43.98 


39.31 


7.93 


500 


687 


75.20 


67.21 


13.55 


1000 


1375 


94.74 


84.68 


17.08 



+ subsubhalo at (O.Si^Jod' +O.5i?ii5o ) 
(iv) isolated host halo + 5 subhaloes at various distances 

The (sub-)subhaloes were placed along the x-axis and given radi- 
ally infalling bulk velocities of 1000 km/sec for the subhalo and 
1200 km/sec for the subsubhalo, respectively. These velocities are 
typical for what you would expect in a dark matter host halo and 
were set to round numbers to make the analysis easier; their values 
were motivated by ^j2GM\^ost{< D)/D where D is the distance 
of the subhalo to the host's centre. 

The first three setups were used to study the overall recovery 
of (sub-)halo properties presented in Section |4.1.1| The fourth test 
has been used to study the radial dependence of subhalo properties 
introduced in Sec tion l4. 1.21 

Besides of the recovery of (sub-)halo properties we also aim 
at answering the question "How many particles are required to find 
a subhalo?". To this end we systematically lowered the number of 
particles (and hence also the subhalo mass as our particle mass re- 
mains constant) used to sample the subhalo listed above as test case 
#2. The properties of these mock subhaloes are summarised in Ta- 
ble[2]and the results will be shown in Section |4. 1.41 

Besides these well controlled tests we also performed a so- 
called "blind test" where the precise set-up of the data to be anal- 
ysed by each halo finder was unknown to the participants. We intro- 
duce this particular experiment alongside its results in a stand-alone 
Section [4X5] Only a small subset of the halo finders took part in 
this trial. 

We close this section with a cautionary remark that not all 
halo finders are ab initio capable of identifying subhaloes and hence 
some of the test cases outlined here were not performed by all the 
finders. Therefore some of the codes only contribute data points for 
the host halo in Section|4] 

3.2 Cosmological Simulation 

The cosmological simulation used for the halo-finder code com- 
parison project is the so-called MareNostrum Universe w hich was 
perfo rmed with the entropy conserving GADGET2 code dSpringell 
l2005h . It followed the nonlinear evolution of structures in gas and 
dark matter from z = 40 to the present epoch {z — 0) within a co- 
moving cube of side 500/i~^ Mpc. It assumed the spatially flat con- 
cordance cosmological model with the following parameters: the 
total matter density fim. = 0.3, the baryon density Qt, = 0.045, the 
cosmological constant SI a = 0.7, the Hubble parameter h = 0.7, 



the slope of the initial power spectrum n = 1, and the normal- 
isation as — 0.9. Both components, the gas and the dark mat- 
ter, were resolved by 1024'^ particles, which resulted in a mass 
of m-DM ~ 8.3 X lO^/i^^M© for the dark matter particles and 
Tigas ~ 1-5 X lO^/i^^M© for the gas particles, respectively. For 
more details we refer the reader to the paper that describes the simu- 
lation and presents results drawn from it (jCottlober & Yepes 200"^. 

For the comparison presented here we discarded the gas par- 
ticles as not all halo finders yet incorporate proper treatment of 
gas physics in their codes. The focus here lies with the dark mat- 
ter structures. However, to avoid that too many particles will be 
considered "unbound" (for those halo finders that perform an un- 
binding procedure), the masses of the dark matter particles have 
been corrected for this, i.e. m^^'^'^^^'^ = mDM/(l — ft) where 
fb = fib /fim is the cosmic baryon fraction of our model universe. 

In order to allow non-parallel halo finders to participate in this 
test we degraded the resolution from the original 1024"^ particles 
down to 512^ as well as to 256'^ particles. The properties to be com- 
pared will however be drawn from the highest-resolved data set for 
each individual halo finder, making the appropriate mass/number 
cuts when producing the respective plots. 

3.3 Code Participation 

Not all codes have participated in all the tests just introduced and 
outlined. Hence in order to facilitate an easier comparison of the 
results and their relation to the particular code we provide in Ta- 
ble [3] an overview of the tests and the halo finders participating in 
them. In that regard we also list for the cosmological simulation the 
respective resolution of the data set analysed by each code. The last 
two columns simply indicate whether the code performs an unbind- 
ing procedure and provided subhalo properties, respectively. 



4 THE COMPARISON 

This Section forms the major part of the paper as it compares the 
halo catalogues derived with various halo finders when applied to 
the suite of test scenarios introduced in the previous Section. We 
first address the issue of the controlled experiments brought for- 
ward in Section 143] followed by the analysis of the cosmological 
simulation introduced in Section |4~2l As already mentioned before, 
we are solely addressing dark matter haloes leaving the inclusion 
of baryonic matter (especially gas) for a later study. 

4.1 Mock Haloes 

Before presenting the results of the cross comparison we need to 
explain further the actual procedures applied. Each data set was 
given to the respective code representative asking them to return 
the centre of each object found as well as a list of the (possible) 
particles belonging to each (sub)halo. A single code only using that 
particular list was then used to derive the bulk velocity Vbuik, the 
(fiducial) mass M200, and the peak of the rotation curve Umax in 
order to eliminate differences in the determination of said values 
from code to code. Or in other words, we did not aim at compar- 
ing how different codes calculate, for instance, fmax or M200 and 
so eliminated that issue. This simple analysis routine is also avail- 
able from the project website. We were aiming at answering the 
more fundamental question "Which particles may or may not be- 
long to a halo?" according to each code. However not all represen- 
tatives returned particle lists as requested (due to a different method 
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Table 1. The properties of the (sub-)haloes for the study of recovered halo properties presented in Section |4. 1 . 1 l and Section 14.1.21 . The number of paiticles 
A'xxx counts all particles out to iJxxx where the density drops below xxx X Pcrit- Masses are given in /i^^Mq, radii in h^^ kpc, and velocities in km/sec. 
Please note that all haloes have been sampled out to 2 X Rioo and that the Plummet subsubhalo does not reach this overdensity and has been truncated at 
13.9h~^ kpc. The halo type indicates whether the halo is a host, a subhalo or a subsubhalo. i?s is the scale length of the appropriate halo type. 



profile type TVioo J\/ioo RlOO ^^200 M200 -R200 f'max 



NFW 


host 


106 


10" 


947.4 


760892 


7.61 xlQi^ 


689.1 


189.5 


715 




sub 


104 


IOI2 


204.1 


8066 


8.07 xlO" 


151.4 


17.0 


182 




subsub 


10^ 


lOlO 


44.0 


84 


8.42 xlO^ 


33.1 


2.6 


43 


Plummet 


host 


106 


10" 


947.0 


966326 


9.66 xlQis 


760.5 


190.0 


961 




sub 


lO"! 


I012 


204.0 


9937 


9.94 xlO" 


161.7 


17.0 


314 




subsub 


10^ 


10"' 


23.9 


100 


10.00x10^ 


23.9 


2.6 


79 



Table 3. Brief summary of the codes participating in the comparison project. The first six columns provide a synopsis of the respective tests the code pailicipated 
in (columns 2-7). The last two columns simply list whether the code performs an unbinding procedure and provided subhalo properties, respectively. 



code 


recovei7 


rad. depend. 


participation in test 
dyn. infall resolution 


bhnd 


cosmology 


unbinding 


subhaloes 


AHF 


yes 


yes 


yes 


yes 


yes 


10243 


yes 


yes 


ASOHF 


yes 


yes 


yes 


yes 


yes 


2563 


yes 


yes 


BDM 


yes 


yes 


yes 


yes 


yes 


5123 


yes 


yes 


pSO 


only host 


no 


no 


no 


only host 


10243 


no 


no 


LANL 


only host 


no 


no 


no 


no 


10243 


no 


no 


SOBFIND 


yes 


yes 


yes 


yes 


yes 


10243 


yes 


yes 


FOF 


yes 


yes 


yes 


yes 


no 


10243, no t;max 


no 


limited 


pFOF 


only host 


no 


no 


no 


no 


5123 


no 


no 


Ntropy-f of sv 


only host 


no 


no 


no 


no 


10243, no t;max 


no 


no 


VOBOZ 


yes 


yes 


no 


yes 


yes 


5123 


yes 


yes 


ORIGAMI 


no 


no 


no 


no 


no 


5123 


yes 


no 


SKID 


yes 


yes 


yes 


yes 


yes 


10243 


yes 


yes 


Adapt aHOP 


yes 


yes 


yes 


yes 


yes 


5123 


no 


yes 


HOT 


yes 


yes 


yes 


yes 


yes 


no 


no 


yes 


HSF 


yes 


yes 


yes 


yes 


yes 


10243 


yes 


yes 


6DF0F 


yes 


yes 


yes 


yes 


yes 


10243 


no 


yes 


Rockstar 


yes 


yes 


yes 


yes 


no 


10243 


yes 


yes 



or technical difficulties) but rather directly provided the values in 
question; those codes are BDM, FOF, and 6DF0F. Further, FOF did 
not provide values for Umax- 

And when comparing results we primarily focused on 
fractional differences to the theoretical values by calculating 
A2;/2;Modei = (a;codG — x-ModcO/sModoi where x is the halo prop- 
erty in question. 



4. 1. 1 Recovery of Host and Subhalo Properties 

For all the subsequent analysis and the plots presented in this sub- 
section 14.1.11 we used the the setups (i) through (iii) specified in 
Section 13.11 In that regard we have three host haloes (one for 
the host alone, one from the host-l-subhalo setup, and one from 
the host-l-subhalo-l-subsubhalo configuration); we further have two 
subhaloes at our disposal (one from the host-l-subhalo and one 
from the host-l-subhalo-l-subsubhalo tests) as well as one subsub- 
halo. In all figures presented below the origin of the halo is indi- 
cated by the size of the symbol: the largest symbol refers to the 
host-l-subhalo-l-subsubhalo set with the symbol size decreasing in 
the order of the host-l-subhalo towards the host test alone. We fur- 
ther always show the results for the NFW mock haloes in the left 



panel and the Plummer spheres in the right one. As much as possi- 
ble, the halo finders have been organised in tenns of their method- 
ology: spherical overdensity finders first followed by FOF-based 
finders with 6D phase-space finders last. 

Centre Determination We start with inspecting the recovery of 
the position of the haloes as practically all subsequent analysis as 
well as the properties of haloes depend on the right centre determi- 
nation. The results can be viewed in Fig. |2] where the y-axis repre- 
sents the halo finder and the a;-axis measures the offset between the 
actual position and the recovered centre in kpc. 

We can clearly see differences for all sorts of comparisons: 
host haloes vs. (sub-)subhaloes, NFW vs. Plummer model, and - 
of course - amongst halo finders. While for the NFW density pro- 
file the deviations between analytical and recovered centre are for 
the majority of haloes and codes below ~5h~^ kpc there are nev- 
ertheless some outliers. For the large halo the 100*'' particle is 
3/1^^ kpc from the nominal centre. These outliers are primarily 
for the FOF-based halo finders which are using a centre-of-mass 
rather than a density-peak as the centre. However, for a perfectly 
spherically symmetric setup as the one used here the differences 
between centre-of-mass and density peak should be small. Some of 
the finders (pSO, LANL, pFOF, ntropy-f of sv) were not 



© 2010 RAS, MNRAS 000.mt27l 



Halo-Finder Comparison Project 15 




1 10 100 

centre offset [kpc/h] 



1 10 100 

entre offset [kpc/fi] 



Rockstor - 
5DF0F - 




0.1 0.2 0.3 

^^bulk/^model 



Figure 2. The offset of the actual and recovered centres for the NFW (left) 
and Plummer (right) density mock haloes. The symbols refer to either the 
host halo, subhalo or subsubhalo as indicated while the symbol size indi- 
cates the test sequence as detailed in the text (i.e. larger symbols for haloes 
containing more subhaloes). 



designed to find substructure and so do not return the locations 
for these. Interestingly H0T6D cannot detect the NFW subsub- 
halo. The situation is a bit different for the Plummer model that 
consists of a flat density profile inwards from the scale radius of 
190/i~^ kpc. While the centre offset for the FOF finders remains 
the same we now also observe a shift towards larger offset-values 
for the majority of the other codes; some codes were even unable 
to locate the host halo at all (e.g. SKID) while other finders even 
marginally improved their (sub-)halo centre determination (AHF, 
ASOHF, H0T3D). Remember that for 6DF0F all positions and ve- 
locities were solely determined from the linked particles which ex- 
plains the slightly offset centres in cases only a few particles were 
linked (as in the case of the Plummer sphere which had an artificial 
low phase space density by construction) as well as the somewhat 
different bulk velocities (when compared to the other halo finders 
below). 

Halo Bulk Velocity A natural follow-up to the halo centre is to 
ask for the credibility of the bulk velocity of the halo. Errors in 
this value would indicate contamination from particles not belong- 
ing to the halo in question to be studied in greater detail in Sec- 
tion l4.1.4l below. In our test data the host is always at rest whereas 
the subhalo (subsubhalo) flies towards the centre with -1000 (- 
1200) km/sec along the negative a;-direction. The fractional dif- 
ference between the model velocity and the bulk velocity as mea- 
sured for each halo finder is presented in Fig. |3] Please note that 
we have normalised the host's velocities to the rotational veloc- 
ity at the J?ioo, i.e. ~1000 km/sec, for the two density profiles. 
Here we find that for practically all halo finders the error in the 
bulk velocity is smaller than 3%; only some outliers exist. Please 
note that we used all particles in the determination of the bulk ve- 
locities as returned/recovered by the respective halo finder. SKID 
displays very significant contamination in the recovered subhaloes 
with a 40% error in the recovered bulk velocity but is also one of the 
codes whose returned particle lists are intended to undergo signif- 
icant post-processing. AdaptaHOP and H0T3D have smaller but 
still significant levels of contamination within the returned sub- 
structures. The marginal offset in the bulk velocities of the host 
Plummer host haloes for 6DF0F and BDM is directly related to the 



Figure 3. Recovery of halo bulk velocities in comparison to the analytical 
input values for the NFW (left) and Plummer (right) density mock haloes. 
Note that the host halo has been set up to be at rest with J^buik = 0. The 
symbols have the same meaning as in Fig.|2] 

respective centre offsets seen in Fig. |2] those two codes base their 
bulk velocities on particles in the central regions. 

Number of Particles In Fig. |4] we are comparing the number of 
particles recovered by each halo finder to the number of particles 
within M200 listed in TableflFi We are aware that there is no such 
well defined radius for (sub-)subhaloes, but it nevertheless provides 
a well-defined base to compare against. 

We observe that while the errors are at times substantial for 
the NFW model the Plummer results appear to be more robust this 
time. But this is readily explained by the form of the applied den- 
sity profile: the variations in mass and hence number of particles 
are more pronounced for the NFW profile than for the Plummer 
model when changing the (definition of the) edge of a halo. Or in 
other words, the total mass of a Plunmier model is well-defined 
whereas the mass of an NFW halo diverges. Therefore, (minor) 
changes and subtleties in the definition of the other edge of a (sub- 
)halo will lead to deviations from the analytically expected value - 
at least for the NFW model. To this extent we also need to clarify 
that each halo finder had been asked to return that set of particles 
that was believed to be part of a gravitationally bound structure; 
participants were not asked to return the list of particles that made 
up A/200 - Post-processing of the supplied particle lists to apply this 
criterion results in errors for the NFW profiles that are well below 
10% - at least for the host haloes (cf. Fig. |5] below). However, a 
straight comparison of the number of recovered particles amongst 
the codes reveals a huge scatter. This is due to the fact that the indi- 
vidual codes are tuned to different criteria to define the edge of the 
halo. Clearly some codes (HSF , HOT , VOBOZ) have been tuned 
to extract an effectively smaller overdensity for this test than say 
6DF0F, LANL, pSO or AHF. This is a well known issue and all 
code developers are well aware of it. Perhaps more concerning is 
the wide scatter in relative mass of the largest subhalo. Here M200 
is ill-determined but the ratio of the substructure mass to the host 
halo mass displays a wide scatter. This ratio is a astrophysical im- 
portance for several issues. 

The difference in host halo seen for FOF and pFOF is - in 



Please note that in all subsequent plots we are using N200 when refer- 
ring to Afmodol- 
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Figure 4. Total number of particles recovered for the (sub-)halo for the 
NFW (left) and Plummer (right) density mock haloes with respect to the 
number of particles within M2oa- The symbols have the same meaning as 
in Fig. [2] 




-0.5 0.0 



1.0 -0.5 0.0 0.5 



Figure 5. A/200 mass (as determined from the supplied particle lists) mea- 
sured according to the mean enclosed density being 200 X Pcrit criterion 
for the NFW (left) and Plummer (right) density mock haloes extracted from 
each finder's list of gravitationally bound particles. The symbols have the 
same meaning as in Fig.|2] 



general - due to the choice of a linking-length not correspond- 
ing to 200 X pcrit- However, with an appropriate linking length 
the FOF algorithm detects the halo at the desired overdensity cor- 
rectly as can be seen for the host only and host-l-subhalo data for 
which there is agreement with the analytical expectation as opposed 
to the host-l-subhalo-l-subsubhalo where the standard linldng length 
has been applied and hence the number of particles (and mass) is 
over-estimated. As a (down-)tuned linking length has also been uti- 
lized for the detection of the (positions of the) subhaloes, the higher 
overdensity encompassed naturally led to a smaller number of par- 
ticles (and masses) than assumed in the model. 

Again, we stress that Fig. |4] does not necessarily reflect the 
number of particles actually used to calculate halo properties; it is 
the raw number of (bound) particles assigned to the centre of the re- 
spective (sub-)halo and used for further post-processing with most 
of the codes. But the comparison also indicates that neither num- 
ber of particles nor M as defined by some overdensity criterion (see 
below) are stable quantities for a fair comparison; this is why we ar- 
gue in favour of the peak of the rotation curve for cross-comparison 
as already highlighted in the introduction. 

Mass Using the particle lists provided by each halo finder we ex- 
tract each object and calculate the density profile. From this we 
determine the point where it drops below 200 x pcrit- This point 
can then be used as a radial distance within which to define M200 
which is then compared against the theoretical expectation (cf. Ta- 
ble [TJ in Fig.|5] Again, we acknowledge that this is not the coiTect 
definition for (sub-)subhalo mass, but can regardlessly be used to 
compare halo finders amongst themselves. 

As already outlined in the previous paragraph, the differences 
to the analytical values (and between the codes) are substantially 
alleviated now that differences in definition for the edge of each 
halo have been removed. The apparent underestimation of the (sub- 
)subhalo masses has also to be taken and digested carefully as the 
M200 values are based upon objects in isolation when these are em- 
bedded in a large host halo. However, please recall that the values 
for BDM, FOF, 5DF0F are based upon their respective criteria 
as these codes did not return particle lists but directly M200 • 

Amongst those codes that did recover subhaloes and under- 
went the same processing scheme there remains a surprisingly wide 
variation in recovered subhalo M200 mass. Almost all the codes 
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Figure 6. Recovery of numerical iimax values in comparison to the ana- 
lytical input values for the NFW (left) and Plummer (light) density mock 
haloes.The symbols have the same meaning as in Fig.|2] 



studied here post-process their subhalo catalogues heavily to alle- 
viate this problem. We would stress however that the precise defi- 
nition for a subhalo contents can, as demonstrated, lead to a range 
of recovered subhalo masses, a point users of subhalo catalogues 
should be well aware of. We will return to the issue of missing sub- 
halo mass in Section B. 1 .3 I below. which provides some explanation 
for the variation. 



Maximum of the Rotation Curve As outlined in Section 11.31 
A/200 does not provide a fair measure for (sub-)subhalo mass and 
hence we consider the maximum circular velocity Umax as a proxy 
for mass. The fractional difference between the theoretically de- 
rived «max and the value based upon the particles returned by each 
halo finder are plotted in Fig. |6] While we now find a considerably 
improved agreement with the analytical calculation the subsubhalo 
has still not been recovered coiTectly in most of the cases. This re- 
sult is entirely in line with the results of figure 7 of iMuldrew et al.l 
( I2OI ih where the error in measuring Umax for a range of particle 
numbers was calculated: we should not be surprised by a 10% un- 
derestimate for our subsubhalo as this is well within expected lim- 
its. 
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4.1.2 Radial Dependence ofSubhalo Properties 

The following test aims at studying how the recovered properties 
of a subhalo change as a function of the distance from the centre 
of the subhalo to the centre of its host. We always placed the same 
subhalo (sampled with 10000 particles) at various distances and ap- 
plied each halo finder to this test scenario, without changing the re- 
spective code parameters in-between the analyses. We then focused 
our attention on the number of gravitationally bound particles in 
Fig- 13 recovered i\/200 masses in Fig. [8] and the maximum of 
the rotation curve in Fig.|9] 

We reiterate that this particular test (as well as the following 
two) is only suited to halo finders that are able to identify substruc- 
ture embedded within the density profile of a larger encompassing 
object. Therefore, some of the codes will not appear in this and the 
following tests in Section |4. 1 .4| and Section |4.1.3l However, we also 
need to acknowledge that some of the code developers were keen 
to participate in this venture and manually tuned their halo finders 
to (at least) provide a centre (and possibly mass) estimate for the 
subhalo under investigation (e.g. FOF by Gottlober & Turachni- 
nov systematically lowered their linking length until an object had 
been found using the spin parameter as a measure for credibility (cf. 
Section [277t : however, as FOF in its basic implementation does not 
perform any unbinding they did not dispense particle lists and/or 
internal properties.). Therefore, the results for FOF are to be taken 
lightly and with care. 

Number of Particles Aside from the location of the substructure, 
which we are not investigating in more detail in this particular Sub- 
section, the number of particles recovered by each halo finder is 
the first quantity to explore as a function of subhalo distance. The 
results can be viewed in Fig. |7] with the NFW mock halo in the up- 
per panel and the Plummer sphere in the lower. Recall that there 
are five subhaloes placed at various distances from the centre of the 
host with the closest one actually overlapping with the host centre. 

As expected from the above results of the previous section 
(which equate to the middle position of these five haloes) the var- 
ious halo finders recover a range of number of particles within the 
halo. Only the phase space based finders are capable of disentan- 
gling the subhalo when it is directly in the centre. Even then their 
particle recovery either indicates that there are too few particles as- 
sociated with the subhalo or that they found the host. We further 
observe that, at least for the NFW haloes, the number of recovered 
particles drops the closer we get to the centre. This is naturally ex- 
plained by the fact that the density contrast of the subhalo becomes 
smaller and the point where the host halo's density takes over is 
closer to the centre of the subhalo. This is another reflection of the 
fact that the number of particles (or anything based upon a mea- 
sure of "halo edge") is not a good proxy for the actual subhalo. 
The situation is obviously different for the Plummer sphere with 
no pronounced density rise towards the centre; therefore, the sub- 
halo appears to be well recovered in this case. For the low number 
of particles recovered by SUBFIND we refer the r eader to an im- 

Siroyed discussion and investigation, respectively, in lMuldrew et al.l 
20lTI) . 

In any case, these are the still simply particle lists; we continue 
to check the (hypothetical) M200 values as well as the recovery of 
the maximum of the rotation curve. When defining a (hypothetical) 
M200 value considering the subhalo in isolation we find basically 
the same trends as for the number of particles. This can be veri- 
fied in Fig. [8] where we observe the same phenomena as in Fig. |7] 
However, SKID is the exception with the M200 values closer to the 




0.0 0.5 1.0 1.5 



Figure 7. Number of particles belonging to the subhalo for the NFW (up- 
per) and Plummer (lower) density mock haloes as a function of subhalo 
distance to the host. 



actual model mass across all distances than the number of particles, 
as expected and as they themselves would obtain during their own 
post-processing steps. 

We note that the discrepancy between the (fiducial) mass and 
the real mass of the subhalo placed at different radial distances from 
the centre is more serious in this idealised set-up than it would be 
in a realistic situation, where the substructures would experience 
tidal truncation in moving towards the inner regions of the halo 
(see the discussion in Section [3T1 as well as the study of the dy- 
namical subhalo infall in Section |4. 1 . 3 1 below) ; when considering 
the mass within the tidal truncation radius, the discrepancy between 
the "real" and recovered mass would reduce. 

The most credible measure of subhalo mass, however, appears 
to be the maximum of the rotation curve: it hardly changes its value 
irrespective of the position inside the host halo as can be seen in 
Fig.H] All halo finders perform equally well in recovering the Wmax 
value from the list of particles used in Fig. |7] This then indicates 
that the only difference amongst the halo finders as seen as a sub- 
stantial spread in (the upper panel of) Fig. |7] stems from the outer 
and less well contrasted regions of the subhalo. 
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Figure 8. Hypothetical A/200 value comparison to tlie NFW (upper) and 
Plummer (lower) sublialo as a function of distance to the host. Af200 was 
calculated again considering the recovered particles N (as presented in 
Fig.|7j in isolation. 

Maximum of Rotation Curve We have seen in Section |4.1.1| that 
the maximum of the rotation curve Vmax serves as an adequate 
proxy for mass and hence we test its sensitivity to radial position in 
Fig.|9] We find that this quantity is, as expected, hardly affected by 
the actual position of the subhalo within the host. Its value is deter- 
mined by the more central regions of the subhalo and hence does 
not change if the object is truncated in the outskirts due to embed- 
ding within the host's background density field. Only when the two 
centres of the sub- and the host halo overlap do we encounter prob- 
lems again, however, H0T6D and HSF even masters this situation 
fairly well (at least for the more realistic NFW test scenario). 

4.1.3 Dynamical Infall of a Subhalo 

The test described and analysed in this Subsection is a dynamic ex- 
tension of the previously studied radial distance test: we throw a 
subhalo (initially sampled with 10000 particles inside i\/ioo) into 
a host halo two orders of magnitude more massive. It was initially 
placed at a distance of D = 3 x i?ioo* with a radially inwards veloc- 
ity ofv = ^2GM{< D)/D = 686km/s and then left to free-fall. 
During the temporal integration of this system with GADGET-2 the 



Figure 9. Recovery of numerical J^max values in comparison to the analyt- 
ical input values for the NFW (upper) and Plummer (lower) density mock 
haloes as a function of subhalo distance to the host. 

cosmological expansion was turned off so the haloes were only af- 
fected by gravity. The orbit of the subhalo takes it right through the 
host halo centre, exiting on the other side. Due to the tidal forces 
the subhalo will lose mass and we aim at quantifying how different 
halo finders recover both the number of (bound) particles as well as 
the evolution of the peak rotational velocity. 

Evolution in Number of Particles In Fig.[To]we start again with 
the number of recovered particles this time as a function of time 
measured in Gyrs since the infalling object passed 2 x i?2oo*- Note 
the fractional difference AA'^/A'modci is measured with respect to 
the number of particles A'^modci prior to infall and that the analysis 
has only been performed over a certain number of output snap- 
shots and not every integration step. At the starting point we ob- 
serve again the same scatter in the number of particles as already 
found in Fie. 170 Until the passage through the very centre of the 
host halo after approximately 1.8 Gyrs we also find the expected 

However, when comparing Fig. |7] and Fig. 1101 one needs to bear in 
mind that the radial dependence of subhalo properties only extends out to 
fs 1.37 X R\^f whereas the lirst data point in Fig.[lO]is for 2 X ij|gg*. 
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drop in number of particles due to the stripping of tlie subhalo; 
however, as noted in Fig.|7]part of this drop can also be attributed 
to the subhalo moving deeper into the dense region of the host. This 
drop in particle number has a marginally different shape depending 
on the halo finder, and for ASOHF there is even a marginal rise. But 
this time actually all halo finders (except the phase-space finders 
H0T6D, HSF and 6DF0F, cf. Fig. [72l below) do lose the subhalo 
when it overlaps with the host halo - or at least are unable to de- 
termine its properties at that time (e.g. 5DF0F actually found the 
objects but could not assign the correct particles to it as the search 
radius for "subhalo membership" was practically zero). After the 
passage through the centre all halo finders identify the object again 
with more particles yet obviously not reaching the original level 
anymore. 

However, we also like to mention that after the core transi- 
tion of the subhalo we expect to find a more or less constant set of 
particles that remain bound to the subhalo: as the radial distance 
increases again there is no reason for the subhalo to lose additional 
mass. It seems clear that the majority of structure finders agree on 
this plateau value, but there are also some that return an unphysical 
result in this regime (e.g. both HOT codes as well as 6DF0F in the 
early phases). 

Please note again that none of the FOF-based halo finders is 
ab initio designed to locate substructure, but the FOF results have 
been included as this code was manually tuned to locate subhaloes 
(cf. Section l4T2t . 

Evolution of the Maximum of tlie Rotation Curve As we have 
seen before a number of times already, the number of particles has 
to be used with care as the actual halo properties will be based 
upon them, but the list has undeniably to be pruned and/or postpro- 
cessed. We therefore present in Fig. [TT] again the evolution of the 
maximum of the rotation curve which focuses on the more central 
regions of the subhalo and its particles. Here we can undoubtedly 
see that all halo finders perform equally well (again): they all start 
with a value equal to the analytical input value and drop by the same 
amount once the subhalo has left the very central regions again. 



However, the majority of the codes (except SUBFIND, HSF, and 
SKID) found a sharp rise of Umax right after the central passage. 

To gain better insight into this region we show in Fig. [12] a 
zoom into the timeframe immediately surrounding the central pas- 
sage, this time though using the distance (as measured by the re- 
spective halo finder) to the host centre as the x-axis. We attribute 
part of this rise to an inclusion of host particles in the subhalo's 
particle list to be studied in greater detail below in Section 14.1.41 
we can see that codes having problems with such contamination 
appear to show this rise too - even though not all of the codes 
showing this rise are amongst the list of finders showing contam- 
ination. However, this rise is also (or maybe even more) indica- 
tive of problems with the unbinding procedure: particles who have 
just left the subhalo (and are then part of the host) may still be 
considered bound depending on the particulars of the halo finder. 
For instance, AHF assumes a spherically symmetric object during 
the unbinding process which is obviously not correct for an object 
heavily elongated by the strong tides during the central passage. 
However, one should also bear in mind that a rise in Umax also oc- 
curs when the subhalo gets (tidal ly) compressed and hence -Rmax 
is lowered (cf. iDekel et al]|2003h even though this has not been 
seen in all (controlled) experime nts of this kind (e.g. lHavashi et al] 
I2OO3I : iKlimentowski et al]|2009l) . 

Finally we point out that the 2;-axis is based upon the distance 
to the host centre as measured by each individual halo finder; and it 
is rather obvious that all halo finders have recovered (more or less) 
the same distance for the subhalo. 

4.1.4 Resolution Study of a Subhalo 

While we have seen that there is little variation of the most stable 
subhalo properties with respect to distance to the host (i.e. iimax) 
we now investigate the number of particles required to (credibly) 
identify a subhalo. To this extent we used setup (ii) from the list in 
Section lSTI where we placed a single subhalo into a host halo at half 
the host's A/100 radius. But this time we also gradually lowered its 
mass and number of particles (keeping the mass of an individual 
particle constant). Even though it is meaningless to talk about i?200 
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Figure 12. The maximum of the rotation curve for the dynamical infall 
study as a function of distance (as measured by the halo finder) to the centre 
of the host - zooming into the region about the centre. 



Figure 13. Fractional difference between number of particles within the 
recovered -R200 and number of particles belonging to the halo as returned 
by the respective halo finder vs. the number of particles inside the subhalo. 



radii for subhaloes again, we are nevertheless comparing the num- 
ber of gravitationally bound particles, as returned by the respective 
halo finder, to the number of particles inside the subhaloes' i?200 
radius; remember that the subhaloes were generated in isolation and 
sampled out to 2x their A/ioo radius (cf. Section lJTt . 

Number of Particles The results of this resolution study can be 
viewed in Fig. [13] where we plot the fractional difference in the 
number of particles within -R200 against the number of particles in 
the subhalo. In this figure there are two important things to note 
and observe: a) the end point of each curve (towards lower particle 
numbers) marks the point where the respective halo finder was no 
longer able to identify the object and b) a constant line (irrespec- 
tive of being above, on top, or below the 0-line) means that for each 
particle number the error in the determination is equal. Again, prac- 
tically all halo finders perform equally well, i.e. they recover the in- 
put number of particles with a constant error across all values. Only 
the two HOT algorithms show a strong deviation due to the lack of 
an unbinding procedure. It is also interesting to compare the (inner) 
end point of the curves marking the number of particles for which 
a certain code stopped finding the subhalo: all of them were still 
able to identify the object with 50 particles. HSF and SKID actu- 
ally went all the way down to 10 particles with VOBOZ, 6DF0F, 
and Rockstar stopping at 20 particles, and AHF at 30. We need 
to stress that codes were asked not to alter their technical param- 
eters while performing this resolution study and hence some may 
in fact be able to recover objects with a lower number of particles 
than presented here. For instance, we are aware that SUBFIND (as 
well as AHF and ASOHF) is capable of going all the way down to 
20 particles, if the technical parameters are adjusted appropriately. 

In any case, we also observe that some codes show a rise in 
AN /Nmodai towards lower particle numbers (e.g. AdaptaHOP, 
HOT); could this be due to contamination from host halo particles? 
We will study this phenomenon in the following Subsection. 

Contamination by Host Particles Downsizing a subhalo yet still 
trying to pin-point it also raises the question how many of the re- 
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Figure 14. Fraction of host's particles identified to be part of the subhalo as 
a function of particles inside the subhalo. 



covered particles are actually subhalo and how many are host halo 
particles. We are in the unique situation to know both the id's of 
the sub- and the host halo and hence studied the "contamination" 
of the subhalo with host particles as a function of the number of 
(theoretical) subhalo particles in Fig. [14] We can see that the vast 
majority of the halo finders did not assign any host particles to the 
subhalo. However, some halo finders appear to have picked up a 
fraction of host particles possibly leading to differences in the sub- 
halo properties such as Umax investigated next. Note that the high 
contamination for AdaptaHOP is due to the lack of an unbinding 
procedure. 
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Figure 15. Fractional difference between theoretical maximum of the rota- 
tion curve and the numerically derived maximum vs. the theoretical maxi- 
mum for the subhalo. 

Maximum of Rotation Curve As the number of particles is 
merely a measure for the cross-performance of halo finders and 
not (directly) related to credible subhalo properties we also need 
to have a look at t'max again. The fractional error as a function of 
the (theoretical) number of subhalo particles is plotted in Fig. [Ts] 
We note that aside from those halo finders who showed a contami- 
nation by host particles all codes recover the theoretical maximum 
of the rotation curve down to the limit of their subhalo's visibil- 
ity (although possibly the last data point for the lowest number of 
particles should be discarded in that regard). 

4.1.5 The "Blind Test" 

Aside from the mock haloes analysed before we also designed a 
particular test where none of the participants had foreknowledge 
of what it contained; only Stuart Muldrew, who generated all the 
mock haloes, knew the setup that is summarised in Table |4] where 
the type "host" refers to the host halo and "sub" to a subhalo. We 
dubbed this individual test the "blind test". Please note that some 
of the subhalo's density profiles in this test followed a Hemquist 
model ( lHemauist|[l99a , marked "Hem" in the Table,) instead of 
the NFW profile. Further, two haloes were deliberately placed at 
the same location yet with diametrically opposed velocities. 

As this test more or less marked the end of the workshop and 
was primarily considered a fun exercise, we did not include it in 
the actual data set presented in Section lSTI Please note that not all 
halo finders participated and that we did not give the players in the 
game a chance to tune their code parameters to the data set. Never- 
theless we decided to simply show visual impressions of those who 
returned results in Fig. [16] There we merely show the projections 
of the (fiducial) R200 and J?vmax radii in the x — y plane as the z 
coordinate of all haloes is identical. 

It is interesting to note that the phase-space halo finders were 
again capable of locating the two overlapping subhaloes even 
though this is not clearly visible in the projection (as their circles 
are obviously overlapping). Of the 3D finders SKID noticed that 
there was something odd at that position, returning one object with 



Table 4. Summary of the haloes in the blind test. Positions are given in 
h^^ Mpc and velocities in km/sec. 
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Figure 16. Visual impression of the "blind tesf (projection into x — y 
plane). Each halo found is represented by a circle with a radius equal to the 
fiducial R200 value (solid black) and the iJvmax value (dashed red). 

double the mass (and 7?vmax extending out to the outer radius). All 
other halo finders only found one of the two subhaloes. Also re- 
member that pSO is not (yet) designed to find subhaloes and hence 
only the host has been returned. It is further remarkable that none of 
the halo finders had trouble finding the two small subhaloes while 
the host had not been found for some of the codes. 

Again, we would like to stress that this test should not be taken 
too seriously. However, we nevertheless remark that analysing a 
cosmological simulation is also a sort of "blind" analysis as the 
answer is not previously known. 

4.2 Cosmological Simulation 

We now turn to the comparison of a real cosmological simulation 
including a substantial number of objects formed and embedded 
within the large-scale structure of the Universe. 

However, even though the simulation contains a large number 
of particles (i.e. up to 1024"^ in the highest resolved data set) the 
given volume of side length 500/i^ ^ Mpc does not allow for a study 
of subhaloes in detail: for the fiducial 512'^ particle run the largest 
object in the simulation box merely contains of order 10 subhaloes 
with the number of substructure objects dramatically decreasing 
when moving to (potentially) lower mass host haloes. We therefore 
stress that this particular comparison only focuses on field haloes 
and hence is well suited even for those codes that (presently) cannot 
cope with subhaloes. 

Further, as mentioned already in Section lT2l we have the data 
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Figure 17. Upper panel: the cumulative mass (A/200) function. The an'ows indicate the 50 particle limit for the 1024'^ (left), 512^ ( middle), and 256'^ (right) 
simula tion data. The thin black lines crossing the whole plot corresponds to the mass function as determined bv lWarrenet"alH2006l (solid)) and lxinker et alj 
(dashed)). The error bars represent the mean mass function of the codes (ilc). Lower panel: the fractional difference of the mean and code halo mass 
functions. For more details please refer to the text. 



available at various resolutions ranging from 256^ to 1024^ parti- 
cles. We decided to use the highest resolution analysis performed 
by each finder as has already been summarised in Table|3]in the sub- 
sequent comparison plots. The analysis in this particular Section 
primarily revolves around the (statistical) recovery of halo prop- 
erties. In that regard we are nevertheless limiting our analysis to 
properties akin to the ones already studied in Section liTI namely 
the mass M, the position R, the peak of the rotation curve timax, 



and the (bulk) velocity Vbuik- We are going to utilise masses as 
defined via 200 x pcrit, i-c. A/200. 

We like to re-iterate at this point again that for this particu- 
lar comparison each halo finder returned halo properties as derived 
from applying the code to the actual data set; we aim at comparing 
the results of the codes for each and every single one being applied 
to the data individually. We consider this the most realistic com- 
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parison as this directly gauges tiie differences of the resulting halo 
catalogues. 

We have already seen that all halo finders are capable of recov- 
ering the mass of mock haloes, irrespective of whether the density 
profile is cored or has a cusp (cf. Fig.|5). We therefore do not expect 
to find surprising differences in the first and most obvious com- 
parison, i.e. the (cumulative) mass function presented in Fig. [TT] 
Please note that pFOF discarded objects below 100 particles and 
hence did not return haloes below ~ 8 x lO^^/i~^M0; similarly, 
pSO discarded objects with fewer than 50 particles, accord ing to 
the criterion laid out in equation (30) of lLukic et al] ( l2007h . And 
in each case the (cumulative) mass function starts to flatten at ap- 
proximately the resolution limit of the simulation analysed by the 
respective code. 

However, ORIGAMI seems to miss some low-mass structures 
caught by other halo-finders. One possible reason is that some 
smaller density enhancements seen by other finders have not un- 
dergone shell-crossing along three axes, and therefore do not meet 
ORIGAMI's definition of a halo. Another is that ORIGAMI may be 
missing many subhaloes, which it does not attempt to separate from 
parent haloes. 

Further, the LANL halo finder is designed to be an FOF finder 
and, if needed, SO objects are defined on top of such friends-of- 
friends haloes. Thus, for smaller haloes completeness is an issue as 
not every SO halo will have an FOF counterpart. Of course, it is 
possible to run the code in the limit fe — > and Nmin = 1, having 
each particle serving as a potential centre of an SO halo, but the in- 
crease in computational cost would make this impractical, as direct 
SO halo finders which do precisely this in a more effective manner 
already exist. Nevertheless, we can see that computationally very 
fast method of growing SO spheres on top of FOF proxy haloes 
result in excellent match when compared to direct SO finders for 
well sampled haloes (~500 particles per halo). 

In order to better view (possible) differences in the mass func- 
tions we further calculated the "mean mass function" in 10 logarith- 
mically placed bins across the range 2 x lO" - 1 x IO^^/i^^Mq 
alongside la error bars for the means. Note that all codes only con- 
tributed to those bins where their data set is considered complete. 
We further deliberately stopped the binning at 1 x lO^^/i~^M0 
to not be dominated by small number statistics for the few largest 
objects. The results can also be viewed in Fig. [17] too, where we 
also show in the bottom panel the fractional difference between the 
mean and the code mass functions across the respective mass range. 
And we additionally added as thin solid black line to the actual 
mass function plot in the up per panel of Fig. 1 171 the numerically 
determined mass function of I Warren et al.l ( l2006h which is based 
upon a suite of sixteen 1024'^ simulations o f the ACDM universe 
as well as the one derived by iTinker et aL I ( l2008h derived from a 
substantial se t of cosmolog i cal sim ulations actually including the 
ones used by I Warren et al.l ( l2006h (cf. their Fig.l). Note that the 
former is based upon FOF and the latter on SO masses. 

As highlighted in the Introduction 11.31 the peak value of the 
rotation curve may be a more suitable quantity to use when it comes 
to comparing the masses of (dark matter) haloes. We therefore show 
in the Fig. [TS] the cumulative distribution of i^max. Apart from the 
expected flattening at low i)max due to resolution we now note that 
this is in fact the case: codes that did not estimate masses according 
to the standard definition A/(< R) — Air/S R'^ Ap nevertheless 
recovered the correct Umax values. Given the ability of comparing 
t'max to observational data (cf. Section fOt we conclude that Umax 
is a more meaningful quantity which can serve as a proxy for mass. 
Please note again the flattening of some curves at the low-iJmax 





10"* 


KHF 






ASOHF 






EDM 




10"* 


pSO 


a. 




. . . LAhJL 






SUBFIND 




10"* 


— pFOF 






VOBOZ 






ORIGAMI 




10"' 


SKID 




AdapLaHOF 
_HSF 
..tJDFOF 
.Rockstar 



[km/sec] 

Figure 18. The cumulative iJmax function. 

end due to either the resolution of the simulation analysed or an 
imposed minimum number of particles cut and that not all FOF- 
based finders returned a Wmax value. 

We have seen in Section|4T|that there exists some scatter be- 
tween halo finders in the recovery of the halo position. It therefore 
appears mandatory to check for differences in halo positions recov- 
ered from the cosmological simulation, too. To this extent we cal- 
culated the 2-point correlation function and present the results in 
Fig. [m In order to analyse a comparable data set (remember that 
some codes analysed the 1024"^, some the 512'^, and some the 256'^ 
particle simulation) we restricted the haloes to the 10000 most mas- 
sive objects and found excellent agreement!^ The smallest scale 
considered in this comparison is 2h~^ Mpc in order not to probe 
the interiors of galaxy clusters. The minute drop of the correlation 
function for pFOF at the smallest scale probed may be explained 
by the usage of the marginally larger linking length of b = 0.2 ap- 
plied during their analysis and the fact that pFOF uses the centre of 
mass instead of the density peak as the centre of the halo. 

Finally we cross-compare the bulk velocities of haloes in 
Fig. [20] where we find excellent agreement. We further give in the 
legend the medians of the distribution for each halo finder: the mean 
(of the medians) is 489 km/sec with a 1 — cr of 9 km/sec (i.e. 2% 
deviation). 



5 SUMMARY & CONCLUSIONS 

We have performed an exhaustive comparison of 18 halo finders 
for cosmological simulations. These codes were subjected to var- 
ious suites of test scenarios all aimed at addressing issues related 
to the subject of identifying gravitationally bound objects in such 
simulations. 

The tests consisted of idealized mock haloes set up accord- 
ing to a specific matter density profile (i.e. NFW and Plummer) 
where we studied isolated haloes as well as (sub-)subhaloes. We 



Please note that it makes little difference to use the 10000 objects with 
the largest t)max value as there is a strong correlation between M and iimax 
for each code. In the end we are interested in limiting the analyses to the 
most massive objects and hence a "mis-calculation" of the mass is irrelevant 
as long as differences in mass are systematic as in our case. 
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Figure 19. The 2-poiiit correlation function for tfie 10000 most massive 
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Figure 20. The distribution of bulk velocities for objects more massive than 
5 X lO"/i-iM0. 



further utilized a cosmological simulation of the large-scale struc- 
ture of the universe primarily containing field haloes. The require- 
ment for the mock haloes was to simply return the centres of the 
identified objects alongside a list of particles (possibly) belonging 
to that halo. We then applied a universal tool to calculate all other 
quantities (e.g. bulk velocity, rotation curve, (virial) mass, etc.). For 
the cosmological data the code representatives were simply asked 
to return their "best" values for a suite of canonical values. 

Mock Haloes We found that the deviation of the recovered po- 
sition to the actual centre of the object is largest for FOF-based 
methods which is naturally explained by the fact that they define 
centres as centre-of-mass whereas most other codes identify a peak 
in the density field. Further, dark matter haloes that have an intrinsic 
core (e.g. a Plummer sphere) yield larger differences between the 
input centre and the recovered centre for most codes. Such density 
profiles are not expected within the Universe we inhabit. However, 
the bulk velocities, (virial) masses, and iJmax values satisfactorily 



agreed with the analytical input irrespective of the underlying den- 
sity profile - at least for host and subhaloes; sub-subhaloes still 
showed at times departures as large as 50% in mass and 20% for 
'^max- Please note that all results are based upon the same post- 
processing software and only the list of particles (and the centre) 
were determined by each halo finder individually. Hence, variations 
in the centre will automatically lead to differences as both mass and 
rotation curve are spherically averaged quantities. 

We further investigated the dependence of subhalo properties 
upon the position within the host, in particular its distance to the 
centre. There we found that - while all codes participating in this 
exercise recovered excellent ?;max values for a NFW subhalo sam- 
pled with 10000 particles inside a NFW host two orders of mag- 
nitude more massiv - phase-space finders excelled by also lo- 
cating the subhalo when it overlapped with the centre of the host. 
However, in this case they struggle to properly calculate its proper- 
ties. 

Putting a subhalo at vaiying positions inside a host is closely 
related to a subhalo actually falling into a host. However, the latter 
also introduces distortions in the shape of the subhalo due to tidal 
forces while it is plunging through the background potential of the 
host. We performed a simulation of the scenario where a subhalo 
initially containing 10000 particles shoots right through the centre 
of a host two orders of magnitude more massive. While we found 
that the number of particles significantly drops when the subhalo 
approaches the host's centre, it rises again to a plateau level after 
the central passage - and this is apparent in all codes. The peak of 
the rotation curve, which should be less susceptible to (tidally in- 
duced) variations in the outer subhalo regions, shows less variation. 
However, Umax actually rises shortly after the subhalo leaves the 
very central region indicative of two (related) effects: contamina- 
tion with host particles and problems with the unbinding procedure. 
Nevertheless, these problems are (still) common to all halo finders 
used in this particular study and they all mutually agree upon the 
initial and final value. 

Another question addressed during our tests with the mock 
haloes was the number of particles required in a subhalo in order to 
still be able to separate it from the host background. To this extent 
we successively lowered the number of particles used to sample a 
subhalo that had been placed at half the A/ioo radius of the host. 
We found that the majority of finders participating in this exercise 
are capable of identifying the subhalo down to 30-40 particles. Yet 
again, (most of) the phase-space finders even locate the object with 
as few as 10-20 particles. Some of the configuration space finders 
also tracked down the subhalo to such low numbers of particles, 
however, they did not obtain the correct particle lists leading to 
subhalo properties that differ from the analytical input values. 

We would like to close this part of the summary with the no- 
tion that while there is a straight-forward relation between (virial) 
mass and the peak of the rotation curve for isolated field haloes 
(once the density profile is known), the mass of a subhalo is more 
ambiguously defined. As we have seen, it is (in most situations) 
more meaningful to utilize the peak of the rotation curve as a proxy 
for mass (cf. Fig.[8]vs. Fig.|9]as well as Fig.[TO]vs. Fig. II It. How- 
ever, as could also be witnessed in Fig.[TT] quite a number of halo 
finding techniques gave rise to an artificial increase of Wmax right 
after the passage through the centre of its host obscuring its appli- 
cability as a mass representative. 



^■^ Note that only halo finders capable of identifying substructure can par- 
ticipate in a comparison of (sub-)subhalo properties. 
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Cosmological Simulation As a matter of fact there is little to say 
regarding the comparison of the cosmological data set; as can be 
seen in Figs. I17lthroughl20l the agreement is well within the (omit- 
ted) error bars for the basic properties investigated here (i.e. mass, 
velocity, position, and Umax). And unless we can be certain which 
halo finding technique is the ultimate (if such exists at all), the ob- 
served scatter indicates the accuracy to which we can determine 
these properties in cosmological simulations. We would though like 
to caution that the haloes found within the cosmological simulation 
are primarily well defined and isolated objects and hence it is no 
suiprise that we find such an agreement. Subhaloes, however, are 
not well defined and therefore lead to larger differences between 
halo finders as seen during the comparison of the mock haloes. For 
those codes that diverge from the general agreement the differences 
are readily explained and have been discussed in Section l431 

Concluding Remarl^s The agreement amongst the different codes 
is rather remarkable and reassuring. While they are based upon dif- 
ferent techniques and - even for those based upon same techniques 
- different technical parameters they appear to recover compara- 
ble properties for dark matter haloes as found in state-of-the-art 
simulations of cosmic structure formation. We nevertheless need to 
acknowledge that some codes require improvement. For instance, 
phase-space finders find halo centres even if the centre overlaps 
with another (distinct) object and recover subhaloes to smaller par- 
ticle number, however they still have problems with the (separated) 
issue of assigning the correct particles in these cases and hence de- 
riving halo properties afterwards. 

We close with the remark that we deliberately did not dwell 
on the actual technical parameters of each and every halo finder as 
this is beyond the scope of this paper and we refer the reader to 
the respective code papers for this. However, it is important to note 
that with an appropriate choice of these parameters the results can 
be brought into agreement. This is an important message from this 
particular study. We are not claiming that all halo finders need to 
return identical results, but they can (possibly) be tuned that way. 
In that regards we also like to remind the reader again that this 
particular comparison is aimed at comparing codes as opposed to 
algorithms; we even tried to gauge the differences found when ap- 
plying codes based upon the same algorithm to identical data sets. 
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